🧠 AI⚪ NeutralImportance 7/10

Reasoning Models Struggle to Control their Chains of Thought

arXiv – CS AI|Chen Yueh-Han, Robert McCarthy, Bruce W. Lee, He He, Ian Kivlichan, Bowen Baker, Micah Carroll, Tomek Korbak|March 9, 2026 at 04:00 AM

🤖AI Summary

Researchers found that AI reasoning models struggle to control their chain-of-thought (CoT) outputs, with Claude Sonnet 4.5 able to control its CoT only 2.7% of the time versus 61.9% for final outputs. This limitation suggests CoT monitoring remains viable for detecting AI misbehavior, though the underlying mechanisms are poorly understood.

Key Takeaways

→AI models show significantly lower ability to control their chain-of-thought reasoning compared to final outputs.
→Larger models exhibit higher CoT controllability, while more RL training and increased problem difficulty reduce it.
→CoT controllability failures persist even when models are incentivized to evade monitoring systems.
→Current low controllability levels suggest CoT monitoring remains effective for detecting AI misbehavior.
→Researchers recommend frontier AI labs track CoT controllability as models advance.

Mentioned in AI

Models

ClaudeAnthropic

SonnetAnthropic

#ai-safety #chain-of-thought #ai-monitoring #reasoning-models #ai-research #claude #arxiv

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AI1h ago