AIBullisharXiv – CS AI · 7h ago7/10
🧠
Memory-Efficient LLM Training with Dynamic Sparsity: From Stability to Practical Scaling
Researchers propose Sparse Memory-Efficient Training (SMET), a method that stabilizes Dynamic Sparse Training for large language models by addressing optimization instability through optimizer warm-up and density-aware learning-rate scaling. The approach reduces memory consumption while maintaining training stability, offering a practical alternative to dense model training.