AINeutralarXiv – CS AI · 6h ago7/10
🧠
Doing What They Say, Not What They Reason: Locating the Faithfulness Gap in LLM Agents
Researchers investigate whether large language model agents actually follow their stated reasoning when making decisions, using a Texas Poker simulator as a controlled test environment. The study identifies a 'faithfulness gap' by decomposing agent behavior into two distinct steps—reasoning-to-conclusion and conclusion-to-action—revealing they behave oppositely, raising concerns about LLM reliability in applications requiring transparent decision-making.