AIBullisharXiv โ CS AI ยท 7h ago7/10
๐ง
Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models
Researchers introduce Perception-Grounded Policy Optimization (PGPO), a novel fine-tuning framework that improves how large vision-language models learn from visual inputs by strategically allocating learning signals to vision-dependent tokens rather than treating all tokens equally. Testing on the Qwen2.5-VL series demonstrates an average 18.7% performance boost across multimodal reasoning benchmarks.