#video-mllm News & Analysis

2 articles tagged with #video-mllm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBearisharXiv – CS AI · Jun 27/10

🧠

Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events

Researchers introduce Moment-Video, a benchmark revealing that current video multimodal large language models (MLLMs) struggle to understand brief, momentary visual events that last only a few frames. Testing 33 models shows the best achieves only 39.6% accuracy, exposing a critical gap in temporal fidelity that persists despite advances in general video understanding.

AIBullisharXiv – CS AI · Jun 27/10

🧠

AdaCodec: A Predictive Visual Code for Video MLLMs

AdaCodec introduces a predictive visual coding approach for video multimodal large language models that adaptively allocates visual tokens based on scene complexity. Rather than encoding each frame independently as RGB images, the system sends full reference frames only when scenes are unpredictable and uses compact tokens for inter-frame changes, achieving superior performance at 1/7th the token budget while reducing latency significantly.