AIBullisharXiv – CS AI · 7h ago7/10
🧠
AdaCodec: A Predictive Visual Code for Video MLLMs
AdaCodec introduces a predictive visual coding approach for video multimodal large language models that adaptively allocates visual tokens based on scene complexity. Rather than encoding each frame independently as RGB images, the system sends full reference frames only when scenes are unpredictable and uses compact tokens for inter-frame changes, achieving superior performance at 1/7th the token budget while reducing latency significantly.