Researchers introduce SkillMaster, a training framework that enables LLM agents to autonomously create, refine, and select skills during task execution rather than relying on external supervision. The system demonstrates 8.8-9.3% performance improvements over existing baselines on complex agent benchmarks, representing a significant step toward self-improving AI agents.
SkillMaster addresses a fundamental limitation in current LLM agent architectures: skills remain static, externally-provided resources rather than dynamic capabilities that evolve through experience. This research moves beyond traditional supervised learning paradigms where human experts must continuously design and update agent behaviors. The framework leverages trajectory-informed skill review, counterfactual evaluation, and a novel DualAdv-GRPO training method to decouple task-solving actions from skill management, enabling more stable joint optimization.
The work builds on growing recognition that autonomous adaptation mechanisms are critical for deploying agents in real-world environments where task distributions shift unpredictably. Previous approaches either froze skill sets or required explicit human intervention to update them. By treating skill development as a learned capability, SkillMaster enables agents to identify procedural failures and generate improvements iteratively.
The experimental validation on ALFWorld and WebShop—complex benchmarks requiring multi-step reasoning and tool use—demonstrates that autonomous skill mastery translates to measurable performance gains. The finding that agents trained this way can transfer skill improvements to novel tasks with minimal skill-bank edits suggests the approach learns generalizable adaptation strategies rather than overfitting to specific scenarios.
This advancement has implications for long-term AI agent reliability and scalability. Self-improving agents reduce dependency on continuous human feedback loops, which currently bottleneck deployment of complex AI systems. As LLM agents tackle increasingly sophisticated tasks, the ability to autonomously refine their own decision-making becomes essential for maintaining performance over extended operation periods.
- →SkillMaster enables LLM agents to autonomously create, refine, and select skills based on task experience rather than external supervision
- →The framework achieves 8.8-9.3% performance improvements over state-of-the-art baselines on ALFWorld and WebShop benchmarks
- →DualAdv-GRPO training method stabilizes joint optimization of task-solving actions and skill-editing decisions through separate advantage estimation
- →Agents trained with SkillMaster can identify skill failures and transfer improvements to future tasks with minimal skill-bank modifications
- →The approach reduces reliance on human experts to manually design and update agent capabilities, improving scalability for complex deployments