AIBullisharXiv – CS AI · 9h ago7/10
🧠
HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling
Researchers introduce HiDe, a training-free framework that improves Multimodal Large Language Models' (MLLMs) performance on high-resolution images by identifying that background interference—not object size—is the primary limitation. The method uses token-wise attention decoupling and layout-preserving techniques to achieve state-of-the-art results on multiple benchmarks while reducing memory usage by 75% compared to existing approaches.