#curriculum-learning News & Analysis

37 articles tagged with #curriculum-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

37 articles

AIBullisharXiv – CS AI · 14h ago7/10

🧠

TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL

Researchers introduce TRON, an online environment framework that generates unlimited, verifiable training instances for visual reasoning reinforcement learning across 520 diverse tasks. The system enables scalable model training without fixed dataset constraints and demonstrates consistent performance improvements on multiple multimodal reasoning benchmarks.

AIBullisharXiv – CS AI · 6d ago7/10

🧠

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders

Researchers introduce SAERL, a data engineering framework that uses Sparse Autoencoders to extract intrinsic signals from LLM internals for improved reinforcement learning post-training. The method achieves 3% accuracy gains and 20% faster convergence on math reasoning tasks by modeling data diversity, difficulty, and quality—demonstrating that model internals provide practical signals beyond external training data metrics.

AIBullisharXiv – CS AI · 6d ago7/10

🧠

GraphDancer: Training LLMs to Explore and Reason over Graphs via Two-Stage Curriculum Post-Training

GraphDancer is a new post-training framework that enables large language models to reason over heterogeneous graph-structured data by combining natural-language reasoning with graph function execution. The two-stage curriculum approach uses structural complexity ordering to teach models to explore and reason over graphs, achieving strong cross-domain generalization with only a 3B parameter backbone.

AIBullisharXiv – CS AI · 6d ago7/10

🧠

Curriculum Learning for Safety Alignment

Researchers propose Staged-Competence, a curriculum learning framework that enhances Direct Preference Optimisation (DPO) for AI safety alignment. The method reduces out-of-distribution harmful responses by 16% and jailbreak success rates by 20% while maintaining model capabilities, achieving baseline safety with 25% less training data.

AIBullisharXiv – CS AI · May 127/10

🧠

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

SimWorld Studio is an open-source platform that automatically generates diverse 3D environments for training embodied AI agents using an evolving coding agent called SimCoder. The system demonstrates significant performance improvements through self-evolution and co-evolution mechanisms, achieving 18-point success-rate gains in navigation tasks compared to fixed environments.

AIBullisharXiv – CS AI · May 127/10

🧠

expo: Exploration-prioritized policy optimization via adaptive kl regulation and gaussian curriculum sampling

Researchers introduce EXPO, an improved reinforcement learning algorithm for LLM mathematical reasoning that dynamically adjusts KL penalty coefficients and prioritizes moderately difficult problems during training. The method demonstrates significant performance improvements over existing GRPO approaches, achieving a 13.34-point absolute gain on AIME 2025 benchmarks.

AIBullisharXiv – CS AI · May 97/10

🧠

Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods

Researchers propose ADAPT, an online data reweighting framework that dynamically adjusts training sample importance during LLM training rather than using static offline selection methods. This approach maintains data diversity while improving generalization, outperforming existing offline curation techniques on instruction tuning and large-scale pretraining tasks.

AIBullisharXiv – CS AI · May 97/10

🧠

Time Series Reasoning via Process-Verifiable Thinking Data Synthesis and Scheduling for Tailored LLM Reasoning

Researchers introduce VeriTime, a framework that enhances large language models for time series analysis through synthetic data generation, intelligent data scheduling, and specialized reinforcement learning. The approach enables smaller models (3B-4B parameters) to match or exceed the reasoning capabilities of larger proprietary LLMs on time series tasks.

AIBullisharXiv – CS AI · May 97/10

🧠

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Researchers introduce ScaleLogic, a synthetic reasoning framework that systematically studies how reinforcement learning improves LLM reasoning across varying task difficulty and logical complexity. The study reveals that RL training compute follows a power law with reasoning depth, with scaling efficiency improving when models train on more expressively complex logic, suggesting that training content quality matters as much as training volume.

AIBullisharXiv – CS AI · Apr 77/10

🧠

Cog-DRIFT: Exploration on Adaptively Reformulated Instances Enables Learning from Hard Reasoning Problems

Researchers introduce Cog-DRIFT, a new framework that improves AI language model reasoning by transforming difficult problems into easier formats like multiple-choice questions, then gradually training models on increasingly complex versions. The method shows significant performance gains of 8-10% on previously unsolvable problems across multiple reasoning benchmarks.

🧠 Llama

AIBullisharXiv – CS AI · Mar 177/10

🧠

Preventing Curriculum Collapse in Self-Evolving Reasoning Systems

Researchers introduce Prism, a new self-evolving AI reasoning system that prevents diversity collapse in problem generation by maintaining semantic coverage across mathematical problem spaces. The system achieved significant accuracy improvements over existing methods on mathematical reasoning benchmarks and generated 100k diverse mathematical questions.

AIBullisharXiv – CS AI · Mar 117/10

🧠

SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning

Researchers introduce SATURN, a new reinforcement learning framework that uses Boolean Satisfiability (SAT) problems to improve large language models' reasoning capabilities. The framework addresses key limitations in existing RL approaches by enabling scalable task construction, automated verification, and precise difficulty control through curriculum learning.

AIBullisharXiv – CS AI · Mar 56/10

🧠

R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning

Researchers developed R1-Code-Interpreter, a large language model that uses multi-stage reinforcement learning to autonomously generate code for step-by-step reasoning across diverse tasks. The 14B parameter model achieves 72.4% accuracy on test tasks, outperforming GPT-4o variants and demonstrating emergent self-checking capabilities through code generation.

🏢 Hugging Face🧠 GPT-4

AIBullisharXiv – CS AI · Mar 46/105

🧠

Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO

Researchers developed a three-stage curriculum learning framework that improves Chain-of-Thought reasoning distillation from large language models to smaller ones. The method enables Qwen2.5-3B-Base to achieve 11.29% accuracy improvement while reducing output length by 27.4% through progressive skill acquisition and Group Relative Policy Optimization.

AIBullisharXiv – CS AI · 1d ago6/10

🧠

D$^3$: Dynamic Directional Graph-Constrained Data Scheduling for LLM Training

Researchers introduce D³, a novel data scheduling framework for LLM training that models interactions between training samples as a dynamic directional graph to optimize training order. The approach outperforms existing data scheduling methods while maintaining computational efficiency through an approximation algorithm.

AINeutralarXiv – CS AI · 1d ago6/10

🧠

PROWL: Prioritized Regret-Driven Optimization for World Model Learning

Researchers introduce PROWL, an adversarial training framework that improves world model robustness by actively discovering failure modes rather than passively learning from demonstration data. The approach uses a KL-constrained policy to expose high-error trajectories in diffusion-based video models while maintaining behavioral constraints, with a prioritized buffer that focuses training on unresolved weaknesses. Results demonstrate significant improvements in handling rare, interaction-critical transitions critical for downstream planning and policy performance.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Tailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model Compatibility

Researchers introduce the Data-Model Compatibility (DMC) metric to evaluate how well training datasets align with student models during reasoning distillation from large language models. The metric jointly assesses data quality, difficulty, and student capability, demonstrating strong correlation with distillation performance and enabling dynamic dataset selection that improves outcomes across multiple models and tasks.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models

Researchers propose Micro-Macro Retrieval (M2R), a framework that reduces hallucination in large language models during long-form text generation by keeping key information closer to model outputs. The method combines coarse-grained external retrieval with fine-grained extraction from an internal knowledge repository, addressing a critical bottleneck where proximity of evidence to final answers directly correlates with factual accuracy.

AIBullisharXiv – CS AI · 4d ago6/10

🧠

Beyond Normalization: Rethinking the Partition Function as a Difficulty Scheduler for RLVR

Researchers propose PACED-RL, a novel post-training framework that reinterprets the partition function in GFlowNet-based LLM training as a difficulty scheduler rather than merely a normalizer. By leveraging per-prompt accuracy signals, the method improves sample efficiency and maintains generation diversity while outperforming existing reward-maximizing approaches.

AINeutralarXiv – CS AI · 5d ago6/10

🧠

Restoring the Sweet Spot: Pass-Rate Weighted Self-Distillation for LLM Reasoning

Researchers propose SC-SDPO, an improved machine learning technique that enhances how large language models learn from their own feedback during training. By weighting training examples based on question difficulty, the method achieves 3-4% performance gains on reasoning benchmarks while maintaining stable training dynamics.

AINeutralarXiv – CS AI · 5d ago6/10

🧠

Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs

Researchers mechanistically analyze how sample difficulty affects Reinforcement Learning with Verifiable Reward (RLVR) training in large language models, discovering that medium-difficulty problems yield optimal reasoning improvements while overly hard problems degrade performance. The study proposes difficulty-adaptive strategies using backward-reasoning reformulation and sparse autoencoders to optimize reward signals during training.

AIBullisharXiv – CS AI · 6d ago6/10

🧠

Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments

Researchers introduce NoisyAgent, a training framework that improves large language model agent robustness by deliberately exposing them to environmental imperfections during training. By simulating real-world interaction noise—including user ambiguity and tool failures—the approach bridges the gap between idealized benchmark performance and practical deployment reliability.

AINeutralarXiv – CS AI · May 126/10

🧠

CrossVL: Complexity-Aware Feature Routing and Paired Curriculum for Cross-View Vision-Language Detection

CrossVL introduces a novel framework combining Complexity-Aware Pathway Aggregation and Paired Curriculum Learning to improve vision-language model performance in cross-view object detection scenarios. The approach addresses fundamental challenges when models operate across different viewpoints (ground and aerial), achieving measurable improvements in detection accuracy and consistency on the MAVREC dataset.

AIBullisharXiv – CS AI · May 116/10

🧠

Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning

Researchers introduce Goldilocks, a curriculum learning strategy that improves reinforcement learning efficiency for language models by having a teacher model dynamically select training questions of optimal difficulty for the student model. This addresses the sample inefficiency problem in sparse-reward RL training and demonstrates performance gains on reasoning tasks compared to standard approaches.

AINeutralarXiv – CS AI · May 76/10

🧠

Overcoming Environmental Meta-Stationarity in MARL via Adaptive Curriculum and Counterfactual Group Advantage

Researchers propose CL-MARL, a curriculum learning framework for multi-agent reinforcement learning that dynamically adjusts task difficulty based on agent performance, addressing a fundamental limitation where fixed-difficulty training constrains policy generalization. The method achieves 40% win rate on complex cooperative tasks, outperforming existing baselines by significant margins.

Page 1 of 2Next →