y0news
#aggregation1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 6h ago1
๐Ÿง 

CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation

Researchers introduce CARE, a new framework for improving LLM evaluation by addressing correlated errors in AI judge ensembles. The method separates true quality signals from confounding factors like verbosity and style preferences, achieving up to 26.8% error reduction across 12 benchmarks.