y0news
← Feed
Back to feed
🧠 AI🔴 BearishImportance 7/10

From Dispersion to Attraction: Spectral Dynamics of Hallucination Across Whisper Model Scales

arXiv – CS AI|Ivan Viakhirev, Kirill Borodin, Grach Mkrtchian|
🤖AI Summary

Researchers propose the Spectral Sensitivity Theorem to explain hallucinations in large ASR models like Whisper, identifying a phase transition between dispersive and attractor regimes. Analysis of model eigenspectra reveals that intermediate models experience structural breakdown while large models compress information, decoupling from acoustic evidence and increasing hallucination risk.

Analysis

This research addresses a fundamental vulnerability in modern speech recognition systems at scale. The Spectral Sensitivity Theorem provides a mathematical framework for understanding why hallucinations emerge differently across model sizes, moving beyond empirical observation to theoretical prediction. The phase transition concept suggests hallucinations aren't random failures but predictable consequences of how information flows through network layers as models grow larger.

The distinction between Regime I (Structural Disintegration) and Regime II (Compression-Seeking Attractor) reveals a critical tradeoff. Intermediate models suffer rank collapse in cross-attention mechanisms, where models lose coherence in mapping audio to transcription. Larger models, paradoxically, develop more dangerous failure modes by actively compressing information and hardening spectral slopes, enabling the model to generate plausible-sounding but acoustically unfounded outputs. This decoupling from acoustic evidence represents a qualitative shift in failure behavior.

For AI practitioners deploying Whisper models in production, these findings carry significant implications. Organizations relying on smaller Whisper variants may see instability in edge cases, while larger deployments risk confident hallucinations that users cannot easily detect. The research suggests that scaling doesn't simply amplify performance—it fundamentally reshapes how models process information and fail under adversarial conditions.

Future work should explore mitigation strategies that target these specific regimes, perhaps through regularization techniques that prevent excessive rank compression or maintain alignment between attention mechanisms and acoustic input. Understanding these phase transitions opens pathways to building more robust speech recognition systems that fail gracefully rather than confidently producing false outputs.

Key Takeaways
  • The Spectral Sensitivity Theorem predicts a phase transition in deep networks from dispersive to attractor regimes based on layer-wise gain and alignment patterns.
  • Intermediate Whisper models experience structural disintegration with 13.4% cross-attention rank collapse under adversarial conditions.
  • Larger models develop more dangerous failure modes by actively compressing information while decoupling from acoustic evidence.
  • Hallucinations in ASR models follow predictable mathematical patterns rather than random failures, enabling targeted mitigation strategies.
  • Model scale changes the qualitative nature of failures, with larger systems producing confident but unfounded hallucinations.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles