y0news
AnalyticsDigestsSourcesRSSAICrypto
#scaffolding1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 5d ago7/103
๐Ÿง 

Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning

Researchers introduced Scaf-GRPO, a new training framework that overcomes the 'learning cliff' problem in LLM reasoning by providing strategic hints when models plateau. The method boosted Qwen2.5-Math-7B performance on the AIME24 benchmark by 44.3% relative to baseline GRPO methods.