AINeutralarXiv – CS AI · 18h ago6/10
🧠
See More, Think Deeper: Query-Expanded Visual Evidence and Answer-Clue Guided Reflection for Long Video Understanding
Researchers introduce CoVER, a new framework for Video Large Language Models that improves long-video understanding by gathering multiple search queries for visual evidence and using answer-specific visual feedback for verification. The approach demonstrates superior performance compared to similarly-sized models and some closed-source alternatives.