AIBullisharXiv โ CS AI ยท 5h ago
๐ง
EvoPrune: Early-Stage Visual Token Pruning for Efficient MLLMs
Researchers developed EvoPrune, a new method that prunes visual tokens during the encoding stage of Multimodal Large Language Models (MLLMs) rather than after encoding. The technique achieves 2x inference speedup with less than 1% performance loss on video datasets, addressing efficiency bottlenecks in AI models processing high-resolution images and videos.