y0news
#rubric-assessment1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 6h ago2
๐Ÿง 

RubricBench: Aligning Model-Generated Rubrics with Human Standards

RubricBench is a new benchmark with 1,147 pairwise comparisons designed to evaluate rubric-based assessment methods for Large Language Models. Research reveals a significant gap between human-annotated and AI-generated rubrics, showing that current state-of-the-art models struggle to autonomously create valid evaluation criteria.