AIBullisharXiv – CS AI · 10h ago6/10
🧠
DARE: Difficulty-Adaptive Reinforcement Learning with Co-Evolved Difficulty Estimation
Researchers introduce DARE, a reinforcement learning framework that improves LLM training efficiency by co-evolving difficulty estimation with policy learning. The method addresses limitations of existing difficulty-aware selection techniques by combining adaptive difficulty estimation, diverse coverage sampling, and tailored training strategies across difficulty tiers.