AIBullisharXiv – CS AI · 18h ago7/10
🧠
RAPID: Layer-Wise Redundancy-Aware Pruning and Importance-Driven Token Merging for Efficient ViT
Researchers introduce RAPID, a depth-aware token reduction framework for Vision Transformers that uses different pruning and merging strategies across network layers to reduce computational costs while maintaining accuracy. The method achieves superior performance compared to existing approaches like ToMe, with up to 4.29% higher accuracy in aggressive compression scenarios.