AINeutralarXiv – CS AI · 3h ago6/10
🧠
Aligning Language Model Benchmarks with Pairwise Preferences
Researchers introduce BenchAlign, a method that automatically recalibrates language model benchmarks using preference data to better predict real-world performance. The approach learns optimal weightings for benchmark questions and can rank unseen models according to human preferences, addressing the gap between traditional benchmark scores and practical utility.