Periodic Asynchrony: An On-Policy Approach for Accelerating LLM Reinforcement Learning
Researchers propose a new asynchronous framework for LLM reinforcement learning that separates inference and training deployment, achieving 3-5x improvement in training throughput. The approach maintains on-policy correctness while enabling concurrent inference and training through a producer-consumer pipeline architecture.