AINeutralarXiv – CS AI · 4h ago6/10
🧠
Dense Supervision Is Not Enough: The Readout Blind Spot in Looped Language Models
Researchers identify a critical supervision blind spot in looped language models where dense cross-entropy loss fails to control hidden-state scale variables in recurrent transitions. The study demonstrates that scale-invariant readout mechanisms like RMSNorm hide radial scaling from loss functions, allowing uncontrolled norm growth in the thousands, and proposes architectural solutions including scale-visible readouts and explicit normalization to improve model efficiency and perplexity at matched inference depths.
🏢 Perplexity