AIBearisharXiv – CS AI · 7h ago7/10
🧠
Calibration Drift Under Reasoning: How Chain-of-Thought Budgets Induce Overconfidence in Large Language Models
Researchers discover that Chain-of-Thought reasoning in large language models can paradoxically increase overconfidence when reasoning budgets exceed task-specific thresholds, a phenomenon called Calibration Drift Under Reasoning (CDUR). The study shows that while extended reasoning initially improves accuracy, it eventually produces internally consistent but incorrect explanations that mislead models into false confidence, with implications for safe LLM deployment.
🧠 Llama