From Dispersion to Attraction: Spectral Dynamics of Hallucination Across Whisper Model Scales
Researchers propose the Spectral Sensitivity Theorem to explain hallucinations in large ASR models like Whisper, identifying a phase transition between dispersive and attractor regimes. Analysis of model eigenspectra reveals that intermediate models experience structural breakdown while large models compress information, decoupling from acoustic evidence and increasing hallucination risk.
This research addresses a fundamental vulnerability in modern speech recognition systems at scale. The Spectral Sensitivity Theorem provides a mathematical framework for understanding why hallucinations emerge differently across model sizes, moving beyond empirical observation to theoretical prediction. The phase transition concept suggests hallucinations aren't random failures but predictable consequences of how information flows through network layers as models grow larger.
The distinction between Regime I (Structural Disintegration) and Regime II (Compression-Seeking Attractor) reveals a critical tradeoff. Intermediate models suffer rank collapse in cross-attention mechanisms, where models lose coherence in mapping audio to transcription. Larger models, paradoxically, develop more dangerous failure modes by actively compressing information and hardening spectral slopes, enabling the model to generate plausible-sounding but acoustically unfounded outputs. This decoupling from acoustic evidence represents a qualitative shift in failure behavior.
For AI practitioners deploying Whisper models in production, these findings carry significant implications. Organizations relying on smaller Whisper variants may see instability in edge cases, while larger deployments risk confident hallucinations that users cannot easily detect. The research suggests that scaling doesn't simply amplify performance—it fundamentally reshapes how models process information and fail under adversarial conditions.
Future work should explore mitigation strategies that target these specific regimes, perhaps through regularization techniques that prevent excessive rank compression or maintain alignment between attention mechanisms and acoustic input. Understanding these phase transitions opens pathways to building more robust speech recognition systems that fail gracefully rather than confidently producing false outputs.
- →The Spectral Sensitivity Theorem predicts a phase transition in deep networks from dispersive to attractor regimes based on layer-wise gain and alignment patterns.
- →Intermediate Whisper models experience structural disintegration with 13.4% cross-attention rank collapse under adversarial conditions.
- →Larger models develop more dangerous failure modes by actively compressing information while decoupling from acoustic evidence.
- →Hallucinations in ASR models follow predictable mathematical patterns rather than random failures, enabling targeted mitigation strategies.
- →Model scale changes the qualitative nature of failures, with larger systems producing confident but unfounded hallucinations.