AIBullisharXiv – CS AI · Mar 176/10
🧠
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Researchers introduce VisionZip, a new method that reduces redundant visual tokens in vision-language models while maintaining performance. The technique improves inference speed by 8x and achieves 5% better performance than existing methods by selecting only informative tokens for processing.