AINeutralarXiv โ CS AI ยท Mar 36/1012
๐ง
RubricBench: Aligning Model-Generated Rubrics with Human Standards
RubricBench is a new benchmark with 1,147 pairwise comparisons designed to evaluate rubric-based assessment methods for Large Language Models. Research reveals a significant gap between human-annotated and AI-generated rubrics, showing that current state-of-the-art models struggle to autonomously create valid evaluation criteria.