AINeutralarXiv – CS AI · 8h ago6/10
🧠
Spectral Scaling Laws of Muon
Researchers present the first systematic study of how singular value spectra behave in Muon optimizer momentum matrices across model scales from 77M to 2.8B parameters. They discover that singular value quantiles stabilize after training burn-in and follow predictable power laws with model size, enabling practitioners to optimize Newton-Schulz iteration configurations and avoid computational waste at scale.