y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#reward-reweighting News & Analysis

1 article tagged with #reward-reweighting. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBullisharXiv – CS AI · 1d ago7/10
🧠

MMR-GRPO: Accelerating GRPO-Style Training through Diversity-Aware Reward Reweighting

Researchers propose MMR-GRPO, a training optimization technique that accelerates Group Relative Policy Optimization (GRPO) for mathematical reasoning models by reweighting rewards based on completion diversity. The method achieves comparable performance while reducing training time by 70.2% and training steps by 47.9%, demonstrating consistent improvements across multiple model sizes and benchmarks.