AIBullisharXiv – CS AI · 10h ago6/10
🧠
Evading Visual Aphasia: Contrastive Adaptive Semantic Token Pruning for Vision-Language Models
Researchers introduce COAST, a novel pruning framework for vision-language models that reduces visual tokens by 77.8% while maintaining 98.64% performance and achieving 2.15x speedup. Unlike existing methods that discard low-attention tokens, COAST uses adaptive semantic routing to preserve contextually essential information, preventing 'Visual Aphasia'—a failure mode where models lose visual grounding.