y0news
← Feed
Back to feed
🧠 AI NeutralImportance 7/10

Attractor Geometry of Transformer Memory: From Conflict Arbitration to Confident Hallucination

arXiv – CS AI|Qiyao Liang, Risto Miikkulainen, Ila Fiete|
🤖AI Summary

Researchers have identified a geometric framework explaining how language models fail through two distinct mechanisms: parametric memory conflicting with working memory, and hallucination from absent learned facts. Both failures produce confident outputs despite being mechanistically different, but hidden-state geometry and 'geometric margin' metrics can distinguish them more reliably than traditional entropy-based detection methods.

Analysis

This research addresses a fundamental problem in large language model deployment: confident hallucinations that persist even as model accuracy improves. The authors move beyond output-level analysis to examine the geometric structure of hidden states during generation, revealing that learned facts create distinct attractor basins in activation space. When parametric memory and working memory conflict, the model gets trapped between competing basins without raising uncertainty signals. When facts were never learned, the hidden state simply drifts without converging anywhere.

The implications extend beyond academic interest. Current confidence-based rejection mechanisms fail catastrophically because they must reject most correct outputs to catch hallucinations, creating an untenable tradeoff for production systems. The geometric margin metric—measuring distance to the nearest learned basin—sidesteps this problem entirely by reading the model's internal uncertainty representation directly. The synthetic experiments demonstrate causality through controlled component isolation, while validation on pretrained models shows the findings generalize beyond fine-tuning artifacts.

A critical discovery is the scaling law: confident hallucinations increase exponentially with model scale even as overall error rates fall. This inverse relationship means larger models appear more capable while their hallucination confidence worsens, creating deceptive scaling curves for practitioners. The frozen language model head, optimized for next-token prediction, systematically erases the epistemic information that hidden states reliably encode. This architectural mismatch worsens with scale, suggesting that architectural modifications beyond fine-tuning may be necessary for reliable uncertainty quantification.

Key Takeaways
  • Hallucination and memory conflict are geometrically distinct failure modes addressable through attractor basin analysis rather than output entropy
  • Geometric margin metric separates correct recall from hallucination with zero false refusals where entropy-based methods fail
  • Confident hallucinations follow an exponential scaling law, increasing with model size despite overall accuracy improvements
  • Hidden states encode epistemic uncertainty reliably, but the frozen output head architecture systematically erases this signal
  • Current confidence-based rejection mechanisms face fundamental tradeoffs requiring architectural rather than training-based solutions
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles