AINeutralarXiv – CS AI · 18h ago6/10
🧠
MM-Matryoshka: Towards Budget-Elastic Visual Document Retrieval via a 2D Multimodal Matryoshka Training Framework
Researchers introduce MM-Matryoshka, a training framework that enables visual document retrievers to dynamically adjust computational and storage costs without requiring multiple models. The approach allows Vision-Language Models to optimize along two dimensions—vector width and encoder depth—while maintaining retrieval quality, addressing a key efficiency challenge in multimodal AI systems.