AINeutralarXiv โ CS AI ยท 9h ago7/10
๐ง
Closing the Confidence-Faithfulness Gap in Large Language Models
Researchers have identified a fundamental issue in large language models where verbalized confidence scores don't align with actual accuracy due to orthogonal encoding of these signals. They discovered a 'Reasoning Contamination Effect' where simultaneous reasoning disrupts confidence calibration, and developed a two-stage adaptive steering pipeline to improve alignment.