AIBullisharXiv – CS AI · 8h ago6/10
🧠
Miner:Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models
Researchers introduce Miner, a novel reinforcement learning method that leverages a model's intrinsic uncertainty as a self-supervised reward signal to improve training efficiency for large reasoning models. The approach achieves state-of-the-art results on reasoning benchmarks, with performance gains up to 4.58 points in Pass@1 metrics compared to existing methods, addressing a critical inefficiency in current critic-free RL training.