y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#on-policy-distillation News & Analysis

2 articles tagged with #on-policy-distillation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBullisharXiv – CS AI Β· 4h ago7/10
🧠

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Researchers introduce Lightning OPD, an offline on-policy distillation framework that eliminates the need for live teacher inference servers during large language model post-training. By enforcing 'teacher consistency'β€”using the same teacher model for both supervised fine-tuning and distillationβ€”the method achieves comparable performance to standard OPD while delivering 4x speedup and significantly reducing infrastructure costs.

AINeutralarXiv – CS AI Β· 4h ago6/10
🧠

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Researchers investigate on-policy distillation (OPD) dynamics in large language model training, identifying two critical success conditions: compatible thinking patterns between student and teacher models, and genuine new capabilities from the teacher. The study reveals that successful OPD relies on token-level alignment and proposes recovery strategies for failing distillation scenarios.