AIBullisharXiv – CS AI · 5h ago7/10
🧠
MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism
Researchers introduce MemDreamer, a framework that enables Vision-Language Models to process hours-long videos by decoupling perception from reasoning through hierarchical graph memory and agentic retrieval. The approach achieves state-of-the-art results while reducing computational context requirements to 2% of full video ingestion, establishing a new paradigm for long-form multimodal understanding.