βBack to feed
π§ AIβͺ Neutral
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
arXiv β CS AI|Zhaolu Kang, Junhao Gong, Jiaxu Yan, Wanke Xia, Yian Wang, Ziwen Wang, Huaxuan Ding, Zhuo Cheng, Wenhao Cao, Zhiyuan Feng, Siqi He, Shannan Yan, Junzhe Chen, Xiaomin He, Chaoya Jiang, Wei Ye, Kaidong Yu, Xuelong Li||1 views
π€AI Summary
Researchers introduce HSSBench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on Humanities and Social Sciences tasks across multiple languages. The benchmark contains over 13,000 samples and reveals significant challenges for current state-of-the-art models in cross-disciplinary reasoning.
Key Takeaways
- βHSSBench is the first dedicated benchmark for evaluating MLLMs on Humanities and Social Sciences tasks requiring interdisciplinary thinking.
- βThe benchmark includes over 13,000 samples across six categories in multiple languages including UN official languages.
- βCurrent state-of-the-art MLLMs struggle significantly with HSS tasks that require linking abstract concepts with visual representations.
- βA novel data generation pipeline uses collaboration between domain experts and automated agents to create high-quality samples.
- βThe research highlights gaps in current MLLM evaluation methods that focus primarily on STEM reasoning rather than horizontal, interdisciplinary thinking.
#mllm#benchmark#multimodal#ai-evaluation#humanities#social-sciences#cross-disciplinary#reasoning#language-models
Read Original βvia arXiv β CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains β you keep full control of your keys.
Related Articles