MASER: Modality-Adaptive Specialist Routing for Embodied 3D Spatial Intelligence
Researchers introduce MASER, a framework that dynamically routes questions to specialized adapters of a vision-language model based on modality relevance, achieving 51.3% oracle agreement on the Open3D-VQA benchmark. The approach demonstrates that no single modality optimally answers all spatial reasoning questions, with point clouds proving superior in over half of test cases.