AINeutralarXiv – CS AI · 3h ago6/10
🧠
SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding
Researchers introduce SONIC-O1, a comprehensive benchmark for evaluating multimodal large language models on audio-video understanding tasks. The study reveals significant performance gaps between closed-source and open-source models, particularly in temporal localization, and identifies demographic disparities in model behavior across 60 hours of real-world conversational data.
🏢 Hugging Face