AINeutralarXiv – CS AI · 7h ago6/10
🧠
SCOPE: Selective Conformal Optimized Pairwise LLM Judging
Researchers introduce SCOPE, a framework that improves LLM-based pairwise evaluation by calibrating confidence thresholds to control error rates. Combined with a new uncertainty metric called Bidirectional Preference Entropy (BPE), the approach achieves reliable judgment quality while accepting significantly more evaluations than existing methods.