A new position paper challenges the prevailing assumption that large language models reason through explicit chain-of-thought outputs, arguing instead that reasoning occurs primarily in latent-state trajectories hidden within model computations. The research separates three confounded factors and proposes that current reasoning benchmarks and interpretability claims need fundamental reevaluation based on this distinction.
The paper addresses a critical gap in how the AI research community studies LLM reasoning. For years, researchers have treated visible chain-of-thought (CoT) outputs as faithful representations of model reasoning, using them to design benchmarks, evaluate interpretability, and guide inference-time interventions. This work challenges that assumption by formalizing competing hypotheses about where reasoning actually occurs.
The shift from surface-level analysis to latent-state dynamics reflects broader maturation in mechanistic interpretability research. As tools for probing internal model states have improved, evidence suggests that explicit reasoning traces are often post-hoc rationalizations rather than the actual computational pathway. The paper's framework separates three previously conflated variables: surface traces (what models write), latent states (internal representations), and serial compute (additional processing steps).
This has significant implications for the field's infrastructure. Current reasoning benchmarks like those measuring mathematical problem-solving or logical deduction may be measuring the wrong thing entirely—they evaluate output quality without revealing the actual mechanisms driving performance. For developers building AI systems, this suggests that improving reasoning requires different optimization strategies than simply prompting for explicit step-by-step thinking.
The research points toward a necessary recalibration of AI safety and interpretability efforts. If reasoning is fundamentally latent, then efforts to steer or align model behavior through prompt engineering or output constraints may miss the actual computational processes that matter. The field must develop better tools for measuring and evaluating internal model dynamics rather than relying on surface outputs.
- →LLM reasoning likely occurs in hidden latent states rather than explicit chain-of-thought outputs, challenging current interpretability assumptions.
- →Current reasoning benchmarks may measure output quality without capturing actual computational mechanisms, potentially providing misleading performance signals.
- →Three confounded factors—surface traces, latent interventions, and compute expansion—must be disentangled in future reasoning research.
- →Mechanistic interpretability research should prioritize latent-state dynamics as the primary object of study rather than visible reasoning traces.
- →Safety and alignment efforts may be ineffective if they target surface-level outputs while ignoring the latent computations that actually drive model behavior.