Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens
A new arXiv study challenges the assumption that Chain of Thought reasoning traces in large language models reflect genuine internal reasoning processes. Researchers found that models trained on corrupted, semantically meaningless intermediate steps perform comparably to those trained on correct reasoning traces, suggesting that intermediate tokens function more as statistical patterns than transparent reasoning proxies.
This research addresses a critical assumption underlying recent advances in reasoning models: that intermediate reasoning tokens (Chains of Thought) capture meaningful computational steps. The study's controlled experiments on transformer models reveal a startling finding—models trained on intentionally corrupted traces with no semantic relationship to their problems achieve comparable or even superior performance compared to models trained on valid reasoning traces. This suggests that models extract value from intermediate tokens through statistical pattern matching rather than semantic understanding of reasoning logic.
The implications extend beyond academic interest. The AI research community has largely interpreted improvements from CoT training as evidence of genuine reasoning capability, with many practitioners viewing these traces as faithful representations of model cognition. This work cautions against that interpretation. When combined with their finding that GRPO-based reinforcement learning improves solution accuracy without improving trace validity, the evidence suggests models decouple solution generation from reasoning trace quality—they optimize for outputs, not reasoning fidelity.
For developers and researchers building reasoning systems, this creates a practical tension. While CoT training demonstrably improves performance, understanding that intermediate steps may lack semantic integrity affects how to interpret model behavior and debug failures. The observation that trace length bears little relationship to problem complexity further undermines treating these outputs as algorithmic descriptions. This research suggests the field should invest less in interpreting CoT traces as windows into model cognition and more in understanding the statistical mechanisms driving performance improvements. Future work must separate genuine reasoning capabilities from emergent statistical artifacts that merely correlate with correct solutions.
- →Models trained on semantically meaningless corrupted reasoning traces perform as well as those trained on correct traces, indicating statistical pattern matching rather than semantic reasoning.
- →Reinforcement learning post-training increases solution accuracy without improving the validity or semantic coherence of intermediate reasoning steps.
- →Trace length does not correlate with problem complexity, suggesting intermediate tokens function independently of underlying computational difficulty.
- →Researchers caution against anthropomorphizing Chain of Thought outputs as evidence of human-like or algorithmic reasoning in language models.
- →The findings challenge assumptions that intermediate tokens represent transparent proxies of model internal computational processes.