AINeutralarXiv – CS AI · 6h ago6/10
🧠
Beyond Uniform Forgetting: A Study of Sequential Direct Preference Optimization Across Preference Settings
Researchers studying sequential Direct Preference Optimization (DPO) in language models find that later training does not uniformly degrade earlier learned preferences, but instead produces varied outcomes depending on objective compatibility and signal strength. Using Llama-3.1-8B-Instruct, the study reveals that preference changes range from degradation to stability or even positive transfer, with pair-level analysis showing aggregate metrics can mask heterogeneous effects across different preference pairs.
🧠 Llama