y0news
#long-horizon-reasoning1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 6h ago2
๐Ÿง 

From Verbatim to Gist: Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video Agents

Researchers have developed MM-Mem, a new pyramidal multimodal memory architecture that enables AI systems to better understand long-horizon videos by mimicking human cognitive memory processes. The system addresses current limitations in multimodal large language models by creating a hierarchical memory structure that progressively distills detailed visual information into high-level semantic understanding.