Atomic Skills are the Prerequisite: When Reinforcement Learning Synthesizes Compositional Reasoning, and When It Only Amplifies
Researchers demonstrate that reinforcement learning can synthesize novel compositional reasoning skills, but only when models first master independent atomic skills through supervised fine-tuning. Using a controlled synthetic dataset, they show SFT alone produces memorization without generalization, while RL bridges the gap to genuine skill integration when prerequisites are met.
This research addresses a fundamental question in machine learning: whether RL truly creates new capabilities or merely amplifies existing ones. The study uses a elegant experimental design with a synthetic biography dataset to isolate two atomic skills—parametric reasoning (knowledge in weights) and contextual reasoning (in-context information)—then measures how well models combine them. The findings reveal a critical insight: supervised fine-tuning on composite tasks achieves high accuracy on training data (90%) but catastrophically fails on novel combinations (18%), indicating rote memorization dominates. Reinforcement learning reverses this pattern, but with a crucial prerequisite—the base model must independently master each atomic skill first through SFT. This discovery has significant implications for AI development, suggesting that complex reasoning emerges not from end-to-end training but from a structured, hierarchical approach. For practitioners building retrieval-augmented generation systems and continual learning applications, the methodology implies that decomposing complex tasks into constituent skills before orchestrating them with RL may prove more reliable than monolithic training approaches. The research also validates a scalability principle: rather than attempting to teach sophisticated reasoning directly, engineers should focus on building robust atomic capabilities and using RL to compose them intelligently. This challenges prevailing end-to-end training paradigms and offers a practical blueprint for systems requiring generalization beyond their training distribution. The work bridges the gap between theoretical AI understanding and practical system design.
- →SFT on composite tasks produces memorization (90% seen, 18% novel), revealing the brittleness of direct supervised training on complex reasoning
- →RL acts as a skill synthesizer only when atomic skills are pre-mastered via independent SFT, establishing a strict prerequisite for compositional reasoning
- →Decoupled atomic training followed by RL offers a scalable alternative to end-to-end training for complex novel reasoning capabilities
- →Complementary reasoning—integrating internal knowledge with external context—requires genuine skill integration rather than rote memorization for reliable performance
- →The approach has direct applications to retrieval-augmented generation and continual learning systems that must generalize beyond training data