AINeutralarXiv โ CS AI ยท 14h ago6/10
๐ง
HumanVBench: Probing Human-Centric Video Understanding in MLLMs with Automatically Synthesized Benchmarks
Researchers introduced HumanVBench, a comprehensive benchmark for evaluating how well multimodal AI models understand human-centric video content across 16 tasks including emotion recognition and speech-visual alignment. The study evaluated 30 leading MLLMs and found significant performance gaps, even among top proprietary models, while introducing automated synthesis pipelines to enable scalable benchmark creation with minimal human effort.