AIBullisharXiv – CS AI · 6h ago6/10
🧠
Revealing Modular Gradient Noise Imbalance in LLMs: Calibrating Adam via Signal-to-Noise Ratio
Researchers present MoLS (Module-wise Learning Rate Scaling via SNR), a technique that automatically calibrates Adam optimizer updates across different modules in large language models by measuring signal-to-noise ratios. The method addresses optimization challenges caused by gradient heterogeneity across LLM components without requiring manual tuning, achieving performance comparable to hand-tuned approaches while maintaining compatibility with memory-efficient training.