From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning
Researchers introduce Visual Attention Score (VAS) to analyze multimodal reasoning models, discovering that higher visual attention correlates strongly with better performance (r=0.9616). They propose AVAR framework that achieves 7% performance gains on Qwen2.5-VL-7B across multimodal reasoning benchmarks.