AIBullisharXiv – CS AI · 9h ago6/10
🧠
TTF: Temporal Token Fusion for Efficient Video-Language Model
Researchers introduce Temporal Token Fusion (TTF), a training-free compression technique that reduces visual tokens in video-language models by 67% while maintaining 99.5% accuracy. The method addresses the critical bottleneck of LLM prefill costs in video understanding by identifying and fusing redundant tokens across video frames using local similarity matching.