AINeutralarXiv โ CS AI ยท 7h ago6/10
๐ง
FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling
Researchers introduce Sol-RL, a two-stage reinforcement learning framework that combines FP4 quantization for efficient rollout generation with BF16 precision for policy optimization in diffusion models. The approach achieves up to 4.64x training acceleration while maintaining alignment quality, addressing the computational bottleneck of scaling RL-based post-training on large foundational models like FLUX.1.