AIBullisharXiv โ CS AI ยท 6d ago6/104
๐ง
When Data is the Algorithm: A Systematic Study and Curation of Preference Optimization Datasets
Researchers conducted the first comprehensive analysis of open-source direct preference optimization (DPO) datasets used to align large language models, revealing significant quality variations. They created UltraMix, a curated dataset that's 30% smaller than existing options while delivering superior performance across benchmarks.