y0news
#grpo3 articles
3 articles
AINeutralarXiv โ€“ CS AI ยท 5h ago1
๐Ÿง 

DIVA-GRPO: Enhancing Multimodal Reasoning through Difficulty-Adaptive Variant Advantage

Researchers have developed DIVA-GRPO, a new reinforcement learning method that improves multimodal large language model reasoning by adaptively adjusting problem difficulty distributions. The approach addresses key limitations in existing group relative policy optimization methods, showing superior performance across six reasoning benchmarks.

AIBullisharXiv โ€“ CS AI ยท 5h ago2
๐Ÿง 

MIST-RL: Mutation-based Incremental Suite Testing via Reinforcement Learning

Researchers propose MIST-RL, a reinforcement learning framework that improves AI code generation by creating more efficient test suites. The method achieves 28.5% higher fault detection while using 19.3% fewer test cases, demonstrating significant improvements in AI code verification efficiency.

AIBullisharXiv โ€“ CS AI ยท 5h ago1
๐Ÿง 

Dr. Seg: Revisiting GRPO Training for Visual Large Language Models through Perception-Oriented Design

Researchers introduce Dr. Seg, a new framework that improves Group Relative Policy Optimization (GRPO) training for Visual Large Language Models by addressing key differences between language reasoning and visual perception tasks. The framework includes a Look-to-Confirm mechanism and Distribution-Ranked Reward module that enhance performance in complex visual scenarios without requiring architectural changes.