AINeutralarXiv – CS AI · 10h ago6/10
🧠
KARMA-MV: A Benchmark for Causal Question Answering on Music Videos
Researchers introduce KARMA-MV, a large-scale dataset of 37,737 multiple-choice questions derived from 2,682 YouTube music videos, designed to benchmark AI models' ability to reason about causal relationships between visual dynamics and musical structure. The dataset leverages LLM-based generation for scalability and proposes a causal knowledge graph approach to improve vision-language model performance on cross-modal audio-visual reasoning tasks.