Visually-Guided Policy Optimization for Multimodal Reasoning
Researchers propose Visually-Guided Policy Optimization (VGPO), a framework that enhances vision-language models' ability to focus on visual information during reasoning tasks. The method addresses a fundamental limitation where text-dominated VLMs suffer from weak visual attention and temporal visual forgetting, improving performance on multimodal reasoning and visual-dependent tasks.