On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective
Researchers propose distinguishing between capability elicitation and capability creation in large language model post-training, arguing that the SFT vs. RL debate oversimplifies how models improve. The framework suggests post-training either reweights existing behaviors or expands what models can practically achieve, with significant implications for how AI development is understood and evaluated.
This research addresses a fundamental conceptual gap in how the AI community discusses model improvement. The paper challenges the conventional wisdom that supervised fine-tuning merely imitates while reinforcement learning discovers, proposing instead a more nuanced framework based on free-energy principles. The distinction between elicitation and creation operates on whether post-training stays within a model's existing behavioral capacity or fundamentally expands it.
The accessible support concept—the set of behaviors a model can realistically produce under computational constraints—becomes the critical boundary. Both SFT and RL function as distribution reweighting mechanisms, but they operate on different signals: demonstrations versus rewards. When training remains proximal to the base model, the primary effect is local reweighting rather than genuine capability expansion. This observation matters because it reframes success metrics around whether new behaviors represent true capability gains or merely probabilistic redistribution of existing ones.
For AI development practitioners, this distinction carries substantial weight. It suggests that post-training effectiveness depends less on methodology choice and more on whether training procedures access mechanisms for genuine capability expansion—search, interaction, tool integration, or information incorporation. Organizations pursuing capability creation should focus on architectural and process innovations beyond signal engineering.
Looking forward, this framework may influence how researchers evaluate post-training approaches and allocate resources between different improvement strategies. The analysis suggests that marginal refinements within existing behavioral spaces offer diminishing returns, while genuine capability breakthroughs require systematic approaches to expanding what models can access and execute. This perspective could reshape post-training research priorities toward interaction-based and search-based methods.
- →Post-training should distinguish between capability elicitation (reweighting existing behaviors) and capability creation (expanding reachable behavior space)
- →SFT and RL both function as distribution reweighting with different signals, not fundamentally different mechanisms
- →The accessible support framework defines whether post-training stays within existing model capacity or expands it
- →True capability creation requires mechanisms beyond signal engineering, including search, interaction, and tool use
- →Post-training effectiveness depends less on SFT vs. RL framing and more on whether training expands behavioral reach