y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#optimizer-theory News & Analysis

1 article tagged with #optimizer-theory. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AINeutralarXiv – CS AI · 10h ago6/10
🧠

Open Problem: Is AdamW Effective Under Heavy-Tailed Noise?

Researchers identify a critical theoretical gap in AdamW, the dominant optimizer for training large language models, questioning whether it can handle heavy-tailed gradient noise common in LLM pretraining. The paper formulates this as an open problem and provides partial theoretical insights, while noting that simpler optimizers like Lion and Muon have already achieved convergence guarantees under heavy-tailed conditions.