AINeutralarXiv โ CS AI ยท 5h ago1
๐ง
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
Researchers introduce HSSBench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on Humanities and Social Sciences tasks across multiple languages. The benchmark contains over 13,000 samples and reveals significant challenges for current state-of-the-art models in cross-disciplinary reasoning.