SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training
Researchers introduce SIRI, a three-phase reinforcement learning framework that enables LLM agents to autonomously discover, validate, and internalize reusable skills without external skill generators or inference-time skill banks. Testing on ALFWorld and WebShop benchmarks shows meaningful performance improvements over baseline methods while reducing deployment complexity and latency.
SIRI addresses a fundamental engineering challenge in LLM agent development: the overhead of external skill management systems. Traditional skill-based approaches require either persistent skill retrieval at inference time, which increases context length and latency, or dependency on external skill generators during training, which complicates deployment pipelines. The framework's three-phase approach—warm-up with GiGPO, self-skill mining, and distillation—represents a meaningful progression toward more autonomous agent learning.
The methodology demonstrates practical advances in agent efficiency. By enabling agents to summarize their own successful trajectories into compact skills, then validate these skills through comparative rollouts, SIRI creates an internal feedback loop that requires minimal external machinery. The distillation phase specifically targets beneficial skills using trajectory-level utility and action-level advantage signals, avoiding unnecessary bloat in the final model.
Performance gains across benchmarks are substantial: improvements from 0.908 to 0.930 on ALFWorld and 0.728 to 0.813 on WebShop indicate the approach delivers measurable benefits. The fact that self-mining achieves performance comparable to distillation with larger closed-source models suggests the framework extracts value efficiently from available resources.
For developers building production LLM agents, reduced inference complexity matters substantially. Eliminating skill banks and external generators directly impacts deployment speed and system reliability. The open-source availability creates opportunities for broader adoption and refinement across different agent architectures and domains.
- →SIRI enables agents to autonomously discover and internalize skills without external skill generators or inference-time retrieval mechanisms
- →Performance improvements of 2.4% on ALFWorld and 11.5% on WebShop demonstrate practical efficiency gains over baseline methods
- →The self-mining strategy achieves comparable results to distillation with larger language models, suggesting effective resource utilization
- →Reduced deployment complexity and inference latency make the framework practically appealing for production LLM agent systems
- →Open-source release creates pathway for broader research and integration into diverse agent architectures