AIBullisharXiv โ CS AI ยท 6h ago1
๐ง
Stateful Token Reduction for Long-Video Hybrid VLMs
Researchers developed a new token reduction method for hybrid vision-language models that process long videos, achieving 3.8-4.2x speedup while retaining only 25% of visual tokens. The approach uses progressive reduction and unified scoring for both attention and Mamba blocks, maintaining near-baseline accuracy on long-context video benchmarks.
$NEAR