y0news
AnalyticsDigestsSourcesRSSAICrypto
#qwen-vlm1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 7h ago7/10
๐Ÿง 

Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models

Researchers introduce Perception-Grounded Policy Optimization (PGPO), a novel fine-tuning framework that improves how large vision-language models learn from visual inputs by strategically allocating learning signals to vision-dependent tokens rather than treating all tokens equally. Testing on the Qwen2.5-VL series demonstrates an average 18.7% performance boost across multimodal reasoning benchmarks.