Why LLMs Hallucinate on Structured Knowledge: A Mechanistic Analysis of Reasoning over Linearized Representations
Researchers have identified the mechanistic causes of hallucinations in large language models when reasoning over structured knowledge like graphs and tables. The study reveals that hallucinations stem from systematic failures in attention allocation and semantic grounding in feed-forward layers, rather than random errors, with findings applicable across multiple structured knowledge formats.
This mechanistic analysis addresses a fundamental limitation of large language models that has significant implications for AI reliability and deployment. When LLMs process structured knowledge converted into sequential tokens, they frequently generate false or unsupported outputs despite having access to correct information. The research moves beyond surface-level observations to identify the precise internal dynamics responsible for these failures.
The findings highlight two critical failure modes in LLM architectures. Attention mechanisms disproportionately focus on superficial structural cues rather than distributing across the full knowledge context, creating shortcut dependencies. Simultaneously, feed-forward networks fail to properly ground external knowledge, causing models to default to their parametric memory—essentially reverting to learned patterns rather than reasoning from provided facts. This explains why even well-structured information fails to prevent hallucinations.
These insights carry substantial implications for developers building AI systems that rely on external knowledge integration. Current approaches to knowledge incorporation may be fundamentally limited by architectural constraints rather than dataset quality or fine-tuning strategies. The research demonstrates that hallucination patterns generalize consistently across different structured formats—graphs, tables, and multi-hop reasoning tasks—suggesting these are architectural rather than task-specific issues.
For practitioners, this work provides a roadmap for potential improvements. Understanding that semantic grounding in feed-forward layers is the primary failure point suggests targeted architectural modifications could improve reliability. The ability to detect hallucinations based on mechanistic patterns opens possibilities for reliability verification without ground truth. Future work should focus on architectural redesigns that strengthen feed-forward grounding and improve attention distribution mechanisms to handle full knowledge contexts effectively.
- →LLM hallucinations on structured knowledge result from systematic attention failures and weak semantic grounding in feed-forward layers, not random errors.
- →Attention mechanisms create shortcuts by concentrating on structural cues rather than distributing across full knowledge context.
- →Feed-forward layers fail to ground external knowledge, causing models to rely on parametric memory instead of provided facts.
- →Hallucination patterns generalize across graphs, tables, and multi-hop reasoning tasks, indicating architectural rather than task-specific limitations.
- →Mechanistic understanding enables hallucination detection without ground truth labels across different structured knowledge formats.