AIBullisharXiv – CS AI · Mar 37/108
🧠
Securing the Floor and Raising the Ceiling: A Merging-based Paradigm for Multi-modal Search Agents
Researchers propose a training-free paradigm for empowering Vision-Language Models with multi-modal search capabilities through cross-modal model merging. The approach uses Optimal Brain Merging (OBM) to combine text-based search agents with base VLMs without requiring expensive supervised training or reinforcement learning.