AIBullisharXiv โ CS AI ยท 10h ago6/10
๐ง
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation
Researchers introduce BERT-as-a-Judge, a lightweight alternative to LLM-based evaluation methods that assesses generative model outputs with greater accuracy than lexical approaches while requiring significantly less computational overhead. The method demonstrates that existing lexical evaluation techniques poorly correlate with human judgment across 36 models and 15 tasks, establishing a practical middle ground between rigid rule-based and expensive LLM-judge evaluation paradigms.