#curriculum-learning News & Analysis

55 articles tagged with #curriculum-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

55 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

Curriculum Reinforcement Learning Can Incentivize Reasoning Capacity in LLMs Beyond the Base Model

Researchers present a boundary-aware Curriculum Reinforcement Learning approach that improves large language model reasoning capacity beyond what standard RLVR methods achieve. Testing across Qwen, Llama, and DeepSeek models shows 9.8 percentage point improvements in pass@256 scores over base models, suggesting a more scalable path for continuous LLM advancement.

🧠 Llama

AIBullisharXiv – CS AI · Jun 97/10

🧠

INFUSER: Influence-Guided Self-Evolution Improves Reasoning

INFUSER is a novel self-evolution framework that enables language models to improve their reasoning capabilities through an iterative co-training process between a Generator and Solver, using an influence-aware scoring mechanism rather than difficulty heuristics. The method achieves 20% relative improvement on mathematical and coding benchmarks, demonstrating that adaptive curriculum learning can outperform larger frozen models.

AIBullisharXiv – CS AI · Jun 97/10

🧠

CURE: Curriculum-guided Multi-task Training for Reliable Anatomy Grounded Report Generation

CURE is a curriculum learning framework that improves medical vision-language models' ability to generate accurate radiology reports with better visual grounding. The method achieves significant gains in grounding accuracy (+0.35 IoU), report quality (+0.192 CXRFEScore), and hallucination reduction (18.6%) without requiring additional training data.

🏢 Hugging Face

AIBullisharXiv – CS AI · Jun 97/10

🧠

Reliable to Expressive: A Curriculum for Rubric-Following Safety Judges

Researchers developed a curriculum-based training method for safety judges that dramatically improves their consistency across different evaluation rubrics. The approach combines dynamic rubric generation with a staged learning process, achieving 94.12-94.88% accuracy with minimal variance across three different rubric styles, outperforming larger general-purpose and specialized LLMs.

AIBullisharXiv – CS AI · Jun 27/10

🧠

TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL

Researchers introduce TRON, an online environment framework that generates unlimited, verifiable training instances for visual reasoning reinforcement learning across 520 diverse tasks. The system enables scalable model training without fixed dataset constraints and demonstrates consistent performance improvements on multiple multimodal reasoning benchmarks.

AIBullisharXiv – CS AI · May 277/10

🧠

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders

Researchers introduce SAERL, a data engineering framework that uses Sparse Autoencoders to extract intrinsic signals from LLM internals for improved reinforcement learning post-training. The method achieves 3% accuracy gains and 20% faster convergence on math reasoning tasks by modeling data diversity, difficulty, and quality—demonstrating that model internals provide practical signals beyond external training data metrics.

AIBullisharXiv – CS AI · May 277/10

🧠

GraphDancer: Training LLMs to Explore and Reason over Graphs via Two-Stage Curriculum Post-Training

GraphDancer is a new post-training framework that enables large language models to reason over heterogeneous graph-structured data by combining natural-language reasoning with graph function execution. The two-stage curriculum approach uses structural complexity ordering to teach models to explore and reason over graphs, achieving strong cross-domain generalization with only a 3B parameter backbone.

AIBullisharXiv – CS AI · May 277/10

🧠

Curriculum Learning for Safety Alignment

Researchers propose Staged-Competence, a curriculum learning framework that enhances Direct Preference Optimisation (DPO) for AI safety alignment. The method reduces out-of-distribution harmful responses by 16% and jailbreak success rates by 20% while maintaining model capabilities, achieving baseline safety with 25% less training data.

AIBullisharXiv – CS AI · May 127/10

🧠

expo: Exploration-prioritized policy optimization via adaptive kl regulation and gaussian curriculum sampling

Researchers introduce EXPO, an improved reinforcement learning algorithm for LLM mathematical reasoning that dynamically adjusts KL penalty coefficients and prioritizes moderately difficult problems during training. The method demonstrates significant performance improvements over existing GRPO approaches, achieving a 13.34-point absolute gain on AIME 2025 benchmarks.

AIBullisharXiv – CS AI · May 127/10

🧠

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

SimWorld Studio is an open-source platform that automatically generates diverse 3D environments for training embodied AI agents using an evolving coding agent called SimCoder. The system demonstrates significant performance improvements through self-evolution and co-evolution mechanisms, achieving 18-point success-rate gains in navigation tasks compared to fixed environments.

AIBullisharXiv – CS AI · May 97/10

🧠

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Researchers introduce ScaleLogic, a synthetic reasoning framework that systematically studies how reinforcement learning improves LLM reasoning across varying task difficulty and logical complexity. The study reveals that RL training compute follows a power law with reasoning depth, with scaling efficiency improving when models train on more expressively complex logic, suggesting that training content quality matters as much as training volume.

AIBullisharXiv – CS AI · May 97/10

🧠

Time Series Reasoning via Process-Verifiable Thinking Data Synthesis and Scheduling for Tailored LLM Reasoning

Researchers introduce VeriTime, a framework that enhances large language models for time series analysis through synthetic data generation, intelligent data scheduling, and specialized reinforcement learning. The approach enables smaller models (3B-4B parameters) to match or exceed the reasoning capabilities of larger proprietary LLMs on time series tasks.

AIBullisharXiv – CS AI · May 97/10

🧠

Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods

Researchers propose ADAPT, an online data reweighting framework that dynamically adjusts training sample importance during LLM training rather than using static offline selection methods. This approach maintains data diversity while improving generalization, outperforming existing offline curation techniques on instruction tuning and large-scale pretraining tasks.

AIBullisharXiv – CS AI · Apr 77/10

🧠

Cog-DRIFT: Exploration on Adaptively Reformulated Instances Enables Learning from Hard Reasoning Problems

Researchers introduce Cog-DRIFT, a new framework that improves AI language model reasoning by transforming difficult problems into easier formats like multiple-choice questions, then gradually training models on increasingly complex versions. The method shows significant performance gains of 8-10% on previously unsolvable problems across multiple reasoning benchmarks.

🧠 Llama

AIBullisharXiv – CS AI · Mar 177/10

🧠

Preventing Curriculum Collapse in Self-Evolving Reasoning Systems

Researchers introduce Prism, a new self-evolving AI reasoning system that prevents diversity collapse in problem generation by maintaining semantic coverage across mathematical problem spaces. The system achieved significant accuracy improvements over existing methods on mathematical reasoning benchmarks and generated 100k diverse mathematical questions.

AIBullisharXiv – CS AI · Mar 117/10

🧠

SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning

Researchers introduce SATURN, a new reinforcement learning framework that uses Boolean Satisfiability (SAT) problems to improve large language models' reasoning capabilities. The framework addresses key limitations in existing RL approaches by enabling scalable task construction, automated verification, and precise difficulty control through curriculum learning.

AIBullisharXiv – CS AI · Mar 56/10

🧠

R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning

Researchers developed R1-Code-Interpreter, a large language model that uses multi-stage reinforcement learning to autonomously generate code for step-by-step reasoning across diverse tasks. The 14B parameter model achieves 72.4% accuracy on test tasks, outperforming GPT-4o variants and demonstrating emergent self-checking capabilities through code generation.

🏢 Hugging Face🧠 GPT-4

AIBullisharXiv – CS AI · Mar 46/105

🧠

Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO

Researchers developed a three-stage curriculum learning framework that improves Chain-of-Thought reasoning distillation from large language models to smaller ones. The method enables Qwen2.5-3B-Base to achieve 11.29% accuracy improvement while reducing output length by 27.4% through progressive skill acquisition and Group Relative Policy Optimization.

AINeutralarXiv – CS AI · Jun 256/10

🧠

Transferability for General Reasoning: An Automated Curriculum for Multi-Domain RLVR

Researchers propose Transfer-Aware Curriculum (TAC), a machine learning optimization technique that dynamically adjusts training priorities across multiple domains by measuring how well improvements in one area transfer to others. The method achieves superior performance on reasoning tasks compared to fixed curricula, suggesting that cross-domain transferability is a critical factor for training more capable AI systems.

🧠 Llama

AINeutralarXiv – CS AI · Jun 236/10

🧠

The Two-Hump Problem: Bridging the Difficulty Gap in Mathematical Reinforcement Learning

Researchers identify a critical structural problem in reinforcement learning for mathematical search tasks, specifically the Andrews-Curtis conjecture, characterized by a 'two-hump' distribution where instances are either trivial or unsolvable. The team addresses this through novel data generation techniques, algorithmic enhancements including supermoves and Transformer architectures, and releases two large-scale benchmark datasets (AC-19 and AC-1M) to advance the field.

AIBullisharXiv – CS AI · Jun 236/10

🧠

EvoRubrics: Dynamic Rubrics as Rewards via Adversarial Co-Evolution for LLM Reinforcement Learning

EvoRubrics introduces a co-evolutionary reinforcement learning framework where a Policy LLM and Rubric Generator jointly improve through adversarial interaction, addressing the limitation of static reward criteria that lose discriminative power as models improve. The approach enables real-time evaluation adaptation and generates transferable reward models, with experiments showing consistent improvements over static and dynamic baselines.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Two-Bridge: Exclusive Objectives and Extended Horizon StarCraft II Benchmark

Researchers have introduced Two-Bridge, a new intermediate benchmark for StarCraft II that bridges the gap between oversimplified mini-games and computationally expensive full-game scenarios. The benchmark isolates tactical skills like navigation and micro-combat while removing economy mechanics, enabling more efficient reinforcement learning research on real-time strategy environments.

AIBullisharXiv – CS AI · Jun 236/10

🧠

Beyond the Next Step: Variable-Length Latent World Models for Long-Horizon Planning

Researchers propose Variable-Length Latent World Models (VLWMs), a novel framework that predicts future environment states across variable action sequence lengths rather than single steps, addressing a fundamental limitation in AI planning. The approach achieves 13% performance improvements over existing latent world models on long-horizon control tasks through curriculum training and specialized planning methods.

AINeutralarXiv – CS AI · Jun 196/10

🧠

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

Researchers introduce Adaptive Binning, a self-supervised learning method for medical tabular data that dynamically adjusts feature discretization during training rather than using fixed global quantization. The approach combines curriculum learning with representation-aware binning to improve performance on unlabeled clinical datasets, alongside a new standardized benchmark for medical tabular SSL evaluation.

AINeutralarXiv – CS AI · Jun 196/10

🧠

Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models

Researchers propose Bayesian Manifold Curriculum (BMC), a new framework for training large language models through reinforcement learning that treats problem sampling as a structured bandit problem rather than independent tasks. The approach organizes problems hierarchically and balances difficulty, diversity, and task relevance, showing that difficulty alone is insufficient for optimal model improvement.

Page 1 of 3Next →