Confidence-Aware Alignment Makes Reasoning LLMs More Reliable
Researchers introduce CASPO, a framework that improves reasoning reliability in large language models by aligning token-level confidence with step-wise logical correctness through preference optimization. The method achieves better performance than tree-search approaches without requiring separate reward models, while introducing CaT inference that dynamically prunes uncertain reasoning branches with minimal computational overhead.