AINeutralarXiv – CS AI · 8h ago6/10
🧠
Why Muon Outperforms Adam: A Curvature Perspective
Researchers demonstrate that Muon, an optimizer for large language model training, outperforms Adam by approximately 2x efficiency through lower Normalized Directional Sharpness (NDS) rather than smaller update scales. Using curvature analysis and stylized quadratic problems, the work reveals that Muon's advantage stems from better balancing of update energy across heterogeneous curvature regions, with benefits amplified in data-imbalanced scenarios.