y0news
← Feed
←Back to feed
🧠 AIβšͺ Neutral

HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models

arXiv – CS AI|Zhaolu Kang, Junhao Gong, Jiaxu Yan, Wanke Xia, Yian Wang, Ziwen Wang, Huaxuan Ding, Zhuo Cheng, Wenhao Cao, Zhiyuan Feng, Siqi He, Shannan Yan, Junzhe Chen, Xiaomin He, Chaoya Jiang, Wei Ye, Kaidong Yu, Xuelong Li||1 views
πŸ€–AI Summary

Researchers introduce HSSBench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on Humanities and Social Sciences tasks across multiple languages. The benchmark contains over 13,000 samples and reveals significant challenges for current state-of-the-art models in cross-disciplinary reasoning.

Key Takeaways
  • β†’HSSBench is the first dedicated benchmark for evaluating MLLMs on Humanities and Social Sciences tasks requiring interdisciplinary thinking.
  • β†’The benchmark includes over 13,000 samples across six categories in multiple languages including UN official languages.
  • β†’Current state-of-the-art MLLMs struggle significantly with HSS tasks that require linking abstract concepts with visual representations.
  • β†’A novel data generation pipeline uses collaboration between domain experts and automated agents to create high-quality samples.
  • β†’The research highlights gaps in current MLLM evaluation methods that focus primarily on STEM reasoning rather than horizontal, interdisciplinary thinking.
Read Original β†’via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains β€” you keep full control of your keys.
Connect Wallet to AI β†’How it works
Related Articles