AINeutralarXiv – CS AI · 6h ago6/10
🧠
Measuring Black-Box Confidence via Reasoning Trajectories: Geometry, Coverage, and Verbalization
Researchers propose a novel black-box confidence estimation method for chain-of-thought reasoning that measures trajectory convergence rather than relying on expensive sampling. Testing across multiple benchmarks and AI models shows significant improvements over self-consistency baselines while requiring only 4 samples instead of 8, with potential applications for safer API-based AI deployment.
🧠 GPT-5🧠 Claude🧠 Sonnet