AIBullisharXiv โ CS AI ยท 10h ago6/10
๐ง
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Researchers introduce VisionZip, a new method that reduces redundant visual tokens in vision-language models while maintaining performance. The technique improves inference speed by 8x and achieves 5% better performance than existing methods by selecting only informative tokens for processing.