AINeutralarXiv – CS AI · 3h ago6/10
🧠
MTAVG-Bench 2.0: Diagnosing Failure Modes of Cinematic Expressiveness in Multi-Talker Audio-Video Generation
Researchers introduce MTAVG-Bench 2.0, a comprehensive benchmark for evaluating multi-talker audio-video generation models beyond basic metrics like lip-sync. The benchmark contains over 10,000 question-answering instances designed to diagnose failures in cinematic expressiveness across acting, narrative, atmosphere, and audio-visual language dimensions.
🧠 Gemini