y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#muon-optimizer News & Analysis

5 articles tagged with #muon-optimizer. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles
AIBullisharXiv – CS AI · May 97/10
🧠

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

Researchers provide theoretical proof that sign-based optimization algorithms like SignSGD outperform standard SGD under specific conditions involving ℓ1-norm stationarity and sparse noise, with complexity improvements scaling by problem dimension d. The analysis bridges theory and practice by demonstrating these advantages during GPT-2 pretraining, explaining why sign-based methods succeed in large language model training despite lacking previous theoretical justification.

AIBullisharXiv – CS AI · Mar 127/10
🧠

HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

Researchers have developed HTMuon, an improved optimization algorithm for training large language models that builds upon the existing Muon optimizer. HTMuon addresses limitations in Muon's weight spectra by incorporating heavy-tailed spectral corrections, showing up to 0.98 perplexity reduction in LLaMA pretraining experiments.

🏢 Perplexity
AINeutralarXiv – CS AI · 4d ago6/10
🧠

How the Optimizer Shapes Learned Solutions in Equivariant Neural Networks

Researchers demonstrate that the Muon optimizer significantly outperforms Adam when training equivariant neural networks, which encode geometric symmetries by design. Analysis of trained models reveals Muon produces solutions with more regular loss surfaces, higher weight ranks, and better-conditioned representations, suggesting optimizer choice substantially influences how neural networks learn geometric constraints.

AINeutralarXiv – CS AI · May 126/10
🧠

Intrinsic Muon: Spectral Optimization on Riemannian Matrix Manifolds

Researchers introduce intrinsic Muon (iMuon), a unified optimization framework that extends the Muon optimizer to Riemannian manifolds while preserving symmetries and enabling closed-form solutions. The approach demonstrates applications in LLM fine-tuning, image classification, and subspace learning with convergence guarantees dependent only on manifold dimension rather than factor conditioning.

AIBullisharXiv – CS AI · Mar 37/107
🧠

MuonRec: Shifting the Optimizer Paradigm Beyond Adam in Scalable Generative Recommendation

Researchers introduce MuonRec, a new optimization framework for recommendation systems that significantly outperforms the widely-used Adam/AdamW optimizers. The framework reduces training steps by 32.4% on average while improving ranking quality by 12.6% in NDCG@10 metrics across traditional and generative recommenders.