AINeutralarXiv – CS AI · 7h ago6/10
🧠
When Context Returns: Toward Robust Internalization in On-Policy Distillation
Researchers identify a critical failure mode in on-policy distillation where reintroducing privileged context (like system prompts) to a distilled student model degrades performance, even on previously solved tasks. They propose a lightweight consistency regularizer using stop-gradient anchoring and forward KL divergence to achieve 'context removability,' enabling models to internalize context while remaining stable when it reappears.