AIBullisharXiv β CS AI Β· 4h ago7/10
π§
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation
Researchers introduce Lightning OPD, an offline on-policy distillation framework that eliminates the need for live teacher inference servers during large language model post-training. By enforcing 'teacher consistency'βusing the same teacher model for both supervised fine-tuning and distillationβthe method achieves comparable performance to standard OPD while delivering 4x speedup and significantly reducing infrastructure costs.