y0news
AnalyticsDigestsSourcesRSSAICrypto
#worldsense1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 5d ago7/103
๐Ÿง 

WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs

Researchers have introduced WorldSense, the first benchmark for evaluating multimodal AI systems that process visual, audio, and text inputs simultaneously. The benchmark contains 1,662 synchronized audio-visual videos across 67 subcategories and 3,172 QA pairs, revealing that current state-of-the-art models achieve only 65.1% accuracy on real-world understanding tasks.