AIBullisharXiv โ CS AI ยท 11h ago6/10
๐ง
Adaptive Rigor in AI System Evaluation using Temperature-Controlled Verdict Aggregation via Generalized Power Mean
Researchers introduce Temperature-Controlled Verdict Aggregation (TCVA), a novel evaluation method that adapts AI system assessment rigor based on application domain requirements. By combining verdict scoring with generalized power-mean aggregation and a tunable temperature parameter, TCVA achieves human-aligned evaluation comparable to existing benchmarks while offering computational efficiency.