y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#sgpo News & Analysis

2 articles tagged with #sgpo. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBullisharXiv – CS AI · Jun 97/10
🧠

sGPO: Trading Inference FLOPs for Training Efficiency in RLVR

Researchers introduce sGPO (sorted Group Policy Optimization), a training method that reduces computational waste in reinforcement learning by using cheap inference to profile query difficulty and dynamically allocate training resources. The approach achieves 3x reduction in total training compute while maintaining or improving performance, representing a significant efficiency breakthrough for large-scale AI model training.

AIBullisharXiv – CS AI · Mar 117/10
🧠

Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO

Researchers introduce Stepwise Guided Policy Optimization (SGPO), a new framework that improves upon Group Relative Policy Optimization (GRPO) by learning from incorrect reasoning responses in large language model training. SGPO addresses the limitation where GRPO fails to update policies when all responses in a group are incorrect, showing improved performance across multiple model sizes and reasoning benchmarks.