AINeutralarXiv – CS AI · 6h ago6/10
🧠
SCRuB: Social Concept Reasoning under Rubric-Based Evaluation
Researchers introduce SCRuB, a novel evaluation framework for measuring how well large language models reason about social concepts—abstract ideas underlying norms, culture, and institutions. Testing frontier models against PhD-level experts on 4,711 prompts, the study finds AI models outperform human experts across all dimensions, with models preferred in 74.4% of comparative judgments, suggesting evaluation saturation in single-turn reasoning tasks.