SkillOS: Learning Skill Curation for Self-Evolving Agents
Researchers introduce SkillOS, a reinforcement learning framework that enables LLM-based agents to autonomously curate and evolve reusable skills from experience rather than relying on manual intervention. The system pairs a frozen agent executor with a trainable skill curator that manages an external skill repository, demonstrating consistent improvements in effectiveness and efficiency across multi-turn and single-turn tasks while generalizing across different agent architectures.
SkillOS addresses a fundamental limitation in current LLM-based agent deployment: the inability to learn and improve from past interactions at scale. While language models excel at one-off problem solving, production systems require agents that accumulate knowledge over time through systematic skill refinement. This research bridges that gap by automating skill curation—a process traditionally requiring manual effort or heuristic-based rules that fail to capture complex, long-horizon dependencies.
The framework's innovation lies in its training approach, which uses composite reward signals and task-stream grouping to provide learning signals for the skill curator component. By separating the agent executor (frozen) from the curator (trainable), the architecture maintains stability while enabling targeted optimization of skill selection and evolution. The observation that learned skills naturally evolve into richly structured metadata suggests the system discovers emergent organizational principles without explicit supervision.
For the AI development community, SkillOS represents progress toward more autonomous, self-improving systems that reduce operational overhead and dependency on human expertise. This has implications for enterprise AI deployment where continuous skill refinement could significantly reduce maintenance costs. The generalization across executor backbones and task domains indicates the approach may have broad applicability beyond the tested scenarios.
The research establishes important baselines for future work in agent self-improvement, though real-world deployment would require validation in production environments with genuine task streams and domain-specific constraints. The framework opens questions about optimal skill granularity, scalability to thousands of skills, and how curation policies transfer across radically different task distributions.
- →SkillOS automates skill curation for LLM agents through reinforcement learning, eliminating reliance on manual skill engineering.
- →The system demonstrates consistent performance improvements over memory-free and memory-based baselines across multiple task types.
- →Learned skills evolve into structured formats encoding higher-level meta-skills, suggesting emergent self-organization.
- →The frozen executor and trainable curator architecture provides stability while enabling targeted optimization of skill management.
- →The approach generalizes across different agent architectures and task domains, indicating broad potential applicability.