🧠 AI⚪ NeutralImportance 6/10

Qworld: Question-Specific Evaluation Criteria for LLMs

arXiv – CS AI|Shanghua Gao, Yuchang Su, Pengwei Sui, Curtis Ginder, Marinka Zitnik|March 26, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce Qworld, a new method for evaluating large language models that generates question-specific criteria using recursive expansion trees instead of static rubrics. The approach covers 89% of expert-authored criteria and reveals capability differences across 11 frontier LLMs that traditional evaluation methods miss.

Key Takeaways

→Qworld generates question-specific evaluation criteria through hierarchical expansion of scenarios, perspectives, and binary criteria.
→The method covers 89% of expert-authored criteria while generating 79% novel criteria validated by human experts.
→Experts rate Qworld criteria higher in insight and granularity compared to existing evaluation methods.
→Testing on 11 frontier LLMs revealed capability differences in long-term impact, equity, and interdisciplinary reasoning.
→The approach adapts evaluation to each question rather than relying on fixed task-level criteria.