AIBullisharXiv – CS AI · 5h ago7/10
🧠
Focus-then-Context: Subject-Centric Progressive Visual Token Reduction for Vision-Language Models
Researchers introduce SPpruner, a new vision-language model optimization technique that reduces computational costs by intelligently filtering visual tokens while maintaining accuracy. The method achieves up to 2.53x speedup with minimal performance loss by prioritizing semantically relevant subjects and their contextual relationships, addressing a major bottleneck in VLM inference.