AINeutralarXiv – CS AI · 8h ago6/10
🧠
Repeated post-training is not Self-improving: Diagnosing Scientific Amnesia in Continual DPO Pipelines
Researchers identify 'scientific amnesia' as a critical failure mode in continual DPO (Direct Preference Optimization) training pipelines where LLMs preserve learned behaviors but fail to accumulate reusable methodological knowledge across sequential training campaigns. Testing five strategy proposers on a 30-campaign benchmark reveals that most approaches degrade performance, with only conservative rule-based scheduling showing consistent improvement.