AIBullisharXiv – CS AI · 3h ago6/10
🧠
Agentic Active Omni-Modal Perception for Multi-Hop Audio-Visual Reasoning
Researchers introduce MOV-Bench, a benchmark for evaluating multi-hop audio-visual reasoning in large language models, and propose AOP-Agent, an agentic framework that enables open-source multimodal LLMs to perform active perception across temporally dispersed audio and visual evidence without additional training.