AINeutralarXiv โ CS AI ยท 4h ago6/10
๐ง
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Researchers investigate on-policy distillation (OPD) dynamics in large language model training, identifying two critical success conditions: compatible thinking patterns between student and teacher models, and genuine new capabilities from the teacher. The study reveals that successful OPD relies on token-level alignment and proposes recovery strategies for failing distillation scenarios.