AIBullisharXiv โ CS AI ยท 6h ago2
๐ง
Dr. Seg: Revisiting GRPO Training for Visual Large Language Models through Perception-Oriented Design
Researchers introduce Dr. Seg, a new framework that improves Group Relative Policy Optimization (GRPO) training for Visual Large Language Models by addressing key differences between language reasoning and visual perception tasks. The framework includes a Look-to-Confirm mechanism and Distribution-Ranked Reward module that enhance performance in complex visual scenarios without requiring architectural changes.