y0news
#memory-efficiency4 articles
4 articles
AIBullisharXiv โ€“ CS AI ยท 4h ago7
๐Ÿง 

Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation

Researchers introduce LoRA-Pre, a memory-efficient optimizer that reduces memory overhead in training large language models by using low-rank approximation of momentum states. The method achieves superior performance on Llama models from 60M to 1B parameters while using only 1/8 the rank of baseline methods.

AINeutralarXiv โ€“ CS AI ยท 4h ago7
๐Ÿง 

Do LLMs Benefit From Their Own Words?

Research reveals that large language models don't significantly benefit from conditioning on their own previous responses in multi-turn conversations. The study found that omitting assistant history can reduce context lengths by up to 10x while maintaining response quality, and in some cases even improves performance by avoiding context pollution where models over-condition on previous responses.

AIBullisharXiv โ€“ CS AI ยท 4h ago5
๐Ÿง 

Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs

Researchers propose Generalized Primal Averaging (GPA), a new optimization method that improves training speed for large language models by 8-10% over standard AdamW while using less memory. GPA unifies and enhances existing averaging-based optimizers like DiLoCo by enabling smooth iterate averaging at every step without complex two-loop structures.

AINeutralarXiv โ€“ CS AI ยท 4h ago0
๐Ÿง 

Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning

Researchers introduce iterated Shared Q-Learning (iS-QL), a new reinforcement learning method that bridges target-free and target-based approaches by using only the last linear layer as a target network while sharing other parameters. The technique achieves comparable performance to traditional target-based methods while maintaining the memory efficiency of target-free approaches.