Transferability for General Reasoning: An Automated Curriculum for Multi-Domain RLVR
Researchers propose Transfer-Aware Curriculum (TAC), a machine learning optimization technique that dynamically adjusts training priorities across multiple domains by measuring how well improvements in one area transfer to others. The method achieves superior performance on reasoning tasks compared to fixed curricula, suggesting that cross-domain transferability is a critical factor for training more capable AI systems.
This research addresses a fundamental challenge in training large language models for complex reasoning: how to allocate computational resources across different skill domains efficiently. Traditional approaches either fix training distributions or adapt based only on local performance metrics, missing the broader picture of how skills reinforce each other across domains like mathematics, programming, and science.
The paper's core innovation lies in leveraging existing training signals to estimate cross-domain gradient alignment at negligible computational cost. By analyzing how gradient steps benefit the entire training suite rather than just the current domain, TAC identifies which domains produce the most broadly applicable improvements. This builds on recent progress in reinforcement learning with verifiable rewards, where models are trained to solve problems with checkable solutions.
The results demonstrate meaningful improvements: TAC outperforms learnability-only baselines by up to 2.8 points (10% relative gain) across both tested models. This matters because training efficiency directly impacts development costs and deployment timelines for advanced reasoning systems. The robustness to imbalanced training mixtures suggests TAC handles real-world scenarios where domain frequencies vary naturally.
Looking forward, this approach could influence how organizations structure multi-task AI training pipelines. The methodology may extend beyond reasoning tasks to other domains requiring broad capability transfer. As models scale and training budgets grow, optimizing curriculum design becomes increasingly valuable for competitive advantage in developing capable reasoning systems.
- βTransfer-Aware Curriculum dynamically prioritizes training domains based on cross-domain benefit rather than local improvement alone
- βTAC achieves 10% relative improvement over learnability-only baselines while adding less than 1% computational overhead
- βThe method uses gradient geometry alignment to estimate transferability without additional training cost
- βCross-domain transferability emerges as a critical signal for multi-domain reasoning model training
- βPerformance gains hold across different model sizes and training mixture compositions