🧠 AI⚪ NeutralImportance 5/10

HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models

arXiv – CS AI|Zhaolu Kang, Junhao Gong, Jiaxu Yan, Wanke Xia, Yian Wang, Ziwen Wang, Huaxuan Ding, Zhuo Cheng, Wenhao Cao, Zhiyuan Feng, Siqi He, Shannan Yan, Junzhe Chen, Xiaomin He, Chaoya Jiang, Wei Ye, Kaidong Yu, Xuelong Li|March 4, 2026 at 05:00 AM|4 views

🤖AI Summary

Researchers introduce HSSBench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on Humanities and Social Sciences tasks across multiple languages. The benchmark contains over 13,000 samples and reveals significant challenges for current state-of-the-art models in cross-disciplinary reasoning.

Key Takeaways

→HSSBench is the first dedicated benchmark for evaluating MLLMs on Humanities and Social Sciences tasks requiring interdisciplinary thinking.
→The benchmark includes over 13,000 samples across six categories in multiple languages including UN official languages.
→Current state-of-the-art MLLMs struggle significantly with HSS tasks that require linking abstract concepts with visual representations.
→A novel data generation pipeline uses collaboration between domain experts and automated agents to create high-quality samples.
→The research highlights gaps in current MLLM evaluation methods that focus primarily on STEM reasoning rather than horizontal, interdisciplinary thinking.