y0news
← Feed
Back to feed
🧠 AI🟢 Bullish

Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation

arXiv – CS AI|Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan||7 views
🤖AI Summary

Researchers introduce LoRA-Pre, a memory-efficient optimizer that reduces memory overhead in training large language models by using low-rank approximation of momentum states. The method achieves superior performance on Llama models from 60M to 1B parameters while using only 1/8 the rank of baseline methods.

Key Takeaways
  • LoRA-Pre optimizer reduces memory footprint by decomposing momentum matrices into low-rank subspaces while maintaining optimization performance.
  • The method achieves highest performance across all tested model sizes in the Llama architecture family (60M to 1B parameters).
  • LoRA-Pre demonstrates remarkable efficiency by achieving comparable results using only 1/8 the rank of baseline methods.
  • In fine-tuning scenarios, LoRA-Pre outperforms standard LoRA by 3.14 points on Llama-3.1-8B and 6.17 points on Llama-2-7B.
  • The approach reframes exponential moving averages in optimizers as training linear regressors via online gradient flow.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles