AINeutralarXiv โ CS AI ยท 6h ago2
๐ง
RubricBench: Aligning Model-Generated Rubrics with Human Standards
RubricBench is a new benchmark with 1,147 pairwise comparisons designed to evaluate rubric-based assessment methods for Large Language Models. Research reveals a significant gap between human-annotated and AI-generated rubrics, showing that current state-of-the-art models struggle to autonomously create valid evaluation criteria.