🧠 AI⚪ NeutralImportance 6/10

AdaMerge: Salience-Aware Adaptive Token Merging for Training-Free Acceleration of Vision Transformers

arXiv – CS AI|Semi Lee, Hyejin Go, Hyesong Choi|May 28, 2026 at 04:00 AM

🤖AI Summary

AdaMerge introduces a training-free method to accelerate Vision Transformers by improving token merging through salience-aware mechanisms and adaptive layer-wise compression. The approach outperforms existing token reduction methods across all computational efficiency benchmarks, maintaining superior accuracy-to-FLOPs ratios on ImageNet-1k evaluations.

Analysis

AdaMerge addresses a critical computational bottleneck in Vision Transformers by advancing token merging methodology. The self-attention mechanism in ViTs requires quadratic computational resources relative to token count, creating practical deployment constraints. While previous token merging approaches like ToMe demonstrated promise as training-free solutions, they operated under a flawed assumption that all tokens contribute equally to model outputs, resulting in information degradation when aggressive compression was applied.

The research builds on established understanding of non-uniform attention patterns in transformer architectures. Token salience varies significantly across sequences, yet prior merging frameworks discarded this insight. AdaMerge incorporates two innovations addressing this gap: salience-weighted similarity uses column-wise feature-affinity centrality to identify and preserve high-importance tokens during merging, while adaptive merging intensity dynamically adjusts compression ratios per layer based on input-specific redundancy patterns.

Benchmark results demonstrate consistent improvements over competing approaches. At 13.4G FLOPs, AdaMerge achieves only 1.06% accuracy degradation compared to 1.45% for PiToMe and 4.62% for DSM on ViT-B/16. This performance gap widens at higher compression levels, suggesting the method's particular effectiveness under resource constraints. The training-free nature preserves practical advantages while delivering measurable quality improvements.

For practitioners deploying vision models in computationally constrained environments, AdaMerge represents tangible progress toward efficient transformer inference. The methodology's applicability extends beyond image classification to video processing and other vision-intensive tasks where token reduction remains costly. Future work likely explores integration with other acceleration techniques and extension to other transformer architectures.

Key Takeaways

→AdaMerge combines salience-weighted token similarity with adaptive per-layer compression to improve Vision Transformer efficiency without retraining
→The framework outperforms existing token-merging methods across all computational efficiency levels, with accuracy advantages widening at higher compression ratios
→Training-free design enables immediate deployment in existing systems without modification to model architectures or training pipelines
→Salience-aware mechanisms preserve high-importance tokens while aggressively merging redundant ones, reducing information loss during compression
→Results demonstrate 1.06% accuracy degradation at 13.4G FLOPs versus 1.45-4.62% for competing approaches on ImageNet-1k

#vision-transformers #token-merging #model-acceleration #computational-efficiency #transformer-optimization #training-free-methods

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

AdaMerge: Salience-Aware Adaptive Token Merging for Training-Free Acceleration of Vision Transformers

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge