y0news
#chatbot-assessment1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 6h ago2
๐Ÿง 

Autorubric: A Unified Framework for Rubric-Based LLM Evaluation

Researchers introduce Autorubric, an open-source Python framework that standardizes rubric-based evaluation of large language models (LLMs) for text generation assessment. The framework addresses scattered evaluation techniques by providing a unified solution with configurable criteria, multi-judge ensembles, bias mitigation, and reliability metrics across three evaluation benchmarks.