AIBullisharXiv โ CS AI ยท 5h ago2
๐ง
Securing the Floor and Raising the Ceiling: A Merging-based Paradigm for Multi-modal Search Agents
Researchers propose a training-free paradigm for empowering Vision-Language Models with multi-modal search capabilities through cross-modal model merging. The approach uses Optimal Brain Merging (OBM) to combine text-based search agents with base VLMs without requiring expensive supervised training or reinforcement learning.