AIBullisharXiv β CS AI Β· 3h ago7/10
π§
Putting HUMANS first: Efficient LAM Evaluation with Human Preference Alignment
Researchers demonstrate that minimal subsets of just 50 examples (0.3% of data) can reliably evaluate large audio models with 93%+ correlation to full benchmarks. By training regression models on human-preference-aligned subsets, they achieve 98% correlation with user satisfactionβoutperforming full benchmark evaluationsβand release the HUMANS benchmark as an efficient LAM evaluation tool.