🧠 AI🟢 BullishImportance 6/10

Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation

arXiv – CS AI|Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan|March 2, 2026 at 05:00 AM|18 views

🤖AI Summary

Researchers introduce LoRA-Pre, a memory-efficient optimizer that reduces memory overhead in training large language models by using low-rank approximation of momentum states. The method achieves superior performance on Llama models from 60M to 1B parameters while using only 1/8 the rank of baseline methods.

Key Takeaways

→LoRA-Pre optimizer reduces memory footprint by decomposing momentum matrices into low-rank subspaces while maintaining optimization performance.
→The method achieves highest performance across all tested model sizes in the Llama architecture family (60M to 1B parameters).
→LoRA-Pre demonstrates remarkable efficiency by achieving comparable results using only 1/8 the rank of baseline methods.
→In fine-tuning scenarios, LoRA-Pre outperforms standard LoRA by 3.14 points on Llama-3.1-8B and 6.17 points on Llama-2-7B.
→The approach reframes exponential moving averages in optimizers as training linear regressors via online gradient flow.