AIBullisharXiv – CS AI · 9h ago6/10
🧠
Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning
Researchers introduce Goldilocks, a curriculum learning strategy that improves reinforcement learning efficiency for language models by having a teacher model dynamically select training questions of optimal difficulty for the student model. This addresses the sample inefficiency problem in sparse-reward RL training and demonstrates performance gains on reasoning tasks compared to standard approaches.