AINeutralarXiv – CS AI · 3h ago6/10
🧠
Look on Demand: A Cognitive Scheduling Framework for Visual Evidence Acquisition in Multimodal Reasoning
Researchers propose CSMR, a multimodal reasoning framework where language models dynamically control when to request visual evidence from independent perception modules, addressing structural limitations in existing vision-language approaches that either lose visual detail through text conversion or suffer from linguistic bias in joint optimization.