AINeutralarXiv – CS AI · 3h ago6/10
🧠
ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation
ADWIN is a new framework for on-policy distillation that optimizes training efficiency by adaptively adjusting rollout lengths instead of requiring full completions for every update. The method reduces training costs by up to 4.1x while maintaining or improving accuracy on math and code reasoning tasks by identifying when shorter teacher-anchored sequences contain sufficient signal for learning.