AINeutralarXiv – CS AI · 2h ago7/10
🧠
Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration
Researchers demonstrate that Large Language Model (LLM) confidence calibration measurements are highly sensitive to methodological choices, including how answers are selected, token probabilities are calculated, and conditioning contexts are applied. The study reveals that verbalized confidence often reflects answer plausibility rather than actual correctness, challenging assumptions about LLM uncertainty quantification.