AIBullisharXiv โ CS AI ยท 5d ago7/103
๐ง
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
Researchers introduced Scaf-GRPO, a new training framework that overcomes the 'learning cliff' problem in LLM reasoning by providing strategic hints when models plateau. The method boosted Qwen2.5-Math-7B performance on the AIME24 benchmark by 44.3% relative to baseline GRPO methods.