AINeutralarXiv – CS AI · Apr 156/10
🧠
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Researchers investigate on-policy distillation (OPD) dynamics in large language model training, identifying two critical success conditions: compatible thinking patterns between student and teacher models, and genuine new capabilities from the teacher. The study reveals that successful OPD relies on token-level alignment and proposes recovery strategies for failing distillation scenarios.