Less is More: Early Stopping Rollout for On-Policy Distillation
Researchers propose Early Stopping Rollout (ESR), a novel distillation technique that improves on-policy student model training by limiting rollout generation to initial response tokens. The method addresses "Off-policy Teacher Decay," where teachers lose effectiveness on later tokens, achieving better performance with higher GPU efficiency than standard approaches.