AINeutralarXiv โ CS AI ยท 1d ago6/10
๐ง
Qworld: Question-Specific Evaluation Criteria for LLMs
Researchers introduce Qworld, a new method for evaluating large language models that generates question-specific criteria using recursive expansion trees instead of static rubrics. The approach covers 89% of expert-authored criteria and reveals capability differences across 11 frontier LLMs that traditional evaluation methods miss.