AIBullisharXiv โ CS AI ยท 6h ago9
๐ง
Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation
Researchers introduce LoRA-Pre, a memory-efficient optimizer that reduces memory overhead in training large language models by using low-rank approximation of momentum states. The method achieves superior performance on Llama models from 60M to 1B parameters while using only 1/8 the rank of baseline methods.