AIBullisharXiv – CS AI · 18h ago7/10
🧠
Sparrow: Sparse Rollout for Stable and Efficient Long-context RL of Large Language Models
Researchers introduce Sparrow, a dynamic sparsity scheduling method that accelerates reinforcement learning training for large language models by 2-2.4x while maintaining stability. The approach identifies a critical threshold in per-token actor-policy mismatch that prevents training collapse during sparse rollout generation, with further improvements possible through distillation techniques.