🧠 AI⚪ NeutralImportance 7/10

A Geometric Taxonomy of Hallucinations in LLMs

arXiv – CS AI|Javier Mar\'in|May 11, 2026 at 04:00 AM

🤖AI Summary

Researchers propose a geometric framework for detecting hallucinations in large language models by analyzing embedding space structure, categorizing three types of errors with different detectability profiles. The approach outperforms standard NLI baselines on expert-annotated datasets, providing interpretable diagnostics for production systems operating under black-box constraints.

Analysis

This research addresses a critical gap in LLM deployment: hallucination detection within realistic operational constraints where only the query, response, and source document are available. Current production systems lack interpretable failure modes when detection mechanisms fail, leaving systems blind to their own errors. The geometric taxonomy elegantly connects theoretical properties of contrastive sentence encoders to practical detection capabilities.

The framework identifies three hallucination types with fundamentally different geometric signatures. Query-proximate unfaithfulness manifests as angular deviation from correct responses and remains detectable through angular ratios. Confabulation beyond the plausibility region produces directional signatures that empirically outperform NLI baselines. However, factual errors that share vocabulary and semantic frames with correct answers become geometrically indistinguishable—a finding that predicts where detection methods fundamentally fail.

For the AI industry, this research provides principled understanding of detection limitations rather than false optimism. Production systems can now predict which hallucination types they can reliably catch versus which require additional safeguards. The human-confabulated dataset across nine domains bridges the gap between theoretical research and deployment reality, enabling validation on realistic error patterns.

The implications extend beyond detection to model improvement. Understanding that certain error classes remain geometrically inseparable suggests training approaches must address fundamental limitations rather than expect embedding-based solutions to solve all hallucination problems. This work establishes clear boundaries for black-box detection, directing future research toward hybrid approaches combining geometric methods with other validation strategies.

Key Takeaways

→Geometric analysis of embedding spaces predicts which hallucination types are detectable and which are not.
→Query-proximate unfaithfulness and confabulation are geometrically distinguishable and detectable via angular signatures.
→Factual errors with shared vocabulary and semantic frames remain geometrically inseparable from correct answers.
→The proposed method outperforms NLI baselines on expert-annotated datasets without requiring white-box access.
→Research validates findings on a 212-pair human-confabulated dataset across nine domains for deployment relevance.