🧠 AI⚪ NeutralImportance 7/10

The Phenomenology of Hallucinations

arXiv – CS AI|Valeria Ruscio, Keiran Thompson|March 17, 2026 at 04:00 AM

🤖AI Summary

Researchers discovered that AI language models hallucinate not from failing to detect uncertainty, but from inability to integrate uncertainty signals into output generation. The study shows models can identify uncertain inputs internally, but these signals become geometrically amplified yet functionally silent due to weak coupling with output layers.

Key Takeaways

→Language models can reliably detect uncertain inputs internally, with these occupying regions 2-3x more dimensionally complex than factual inputs.
→The core issue is weak coupling between uncertainty detection and output generation, not failure to recognize uncertainty.
→Cross-entropy training rewards confident predictions uniformly, providing no mechanism for models to abstain from answering.
→Uncertainty representations fragment rather than converging to a unified state where models could refuse to answer.
→Causal interventions demonstrated that directly connecting uncertainty signals to output logits can restore refusal behavior.