Putting HUMANS first: Efficient LAM Evaluation with Human Preference Alignment
Researchers demonstrate that minimal subsets of just 50 examples (0.3% of data) can reliably evaluate large audio models with 93%+ correlation to full benchmarks. By training regression models on human-preference-aligned subsets, they achieve 98% correlation with user satisfaction—outperforming full benchmark evaluations—and release the HUMANS benchmark as an efficient LAM evaluation tool.