AIBullisharXiv – CS AI · 14h ago7/10
🧠
OccamToken: Efficient VLM Inference with Training-Free and Budget-Adaptive Token Pruning
Researchers introduce OccamToken, a training-free method for compressing vision-language models by pruning unnecessary visual tokens while maintaining accuracy. The approach reduces visual token sequences by 98.6% (from 2,880 to 40 tokens) on LLaVA-NeXT while preserving over 93% accuracy, addressing computational bottlenecks in VLM inference.