AIBullisharXiv – CS AI · Apr 157/10
🧠
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation
Researchers introduce Lightning OPD, an offline on-policy distillation framework that eliminates the need for live teacher inference servers during large language model post-training. By enforcing 'teacher consistency'—using the same teacher model for both supervised fine-tuning and distillation—the method achieves comparable performance to standard OPD while delivering 4x speedup and significantly reducing infrastructure costs.