🧠 AI⚪ NeutralImportance 6/10

Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs

arXiv – CS AI|Giyeong Oh, Junghyun Lee, Jaehyun Park, Youngjae Yu, Wonho Bae, Junhyug Noh|April 6, 2026 at 04:00 AM

🤖AI Summary

Research from arXiv shows that Active Preference Learning (APL) provides minimal improvements over random sampling in training modern LLMs through Direct Preference Optimization. The study found that random sampling performs nearly as well as sophisticated active selection methods while being computationally cheaper and avoiding capability degradation.

Key Takeaways

→Active Preference Learning yields negligible improvements in win-rates compared to simple random sampling when training modern LLMs.
→Modern LLMs' strong pre-training priors limit the effectiveness of sophisticated post-training data selection strategies.
→Win-rate improvements from APL can paradoxically coincide with degradation in general LLM capabilities measured by standard benchmarks.
→Random sampling provides 'cheap diversity' that is difficult to justify replacing with computationally expensive active selection methods.
→The research challenges the assumption that more sophisticated training data selection automatically leads to better LLM performance.