Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning
Researchers propose SelSkill, a machine learning framework that improves how AI agents decide whether to invoke specific skills during task execution. The method demonstrates significant performance improvements on benchmark tasks by learning when to use skills versus skip them, addressing a gap in existing agentic AI systems that struggle with unnecessary skill invocations.
SelSkill addresses a fundamental problem in agentic AI systems: knowing not just which skills are relevant, but whether they should actually be used at a given moment. Previous research focused on skill selection and improvement, but overlooked the execution-time decision of skill invocation. This distinction matters because invoking an irrelevant skill can inject confusing context and derail otherwise sound execution paths. The framework uses dual-granularity preference learning, combining both episode-level (overall task success) and step-level (local invocation effectiveness) preferences to train more nuanced decision-making policies.
The technical approach leverages predictive uncertainty to identify critical decision points, then constructs invoke-skip preference pairs from shared trajectory prefixes—a clever way to generate training signals without requiring entirely separate trajectories. Results demonstrate substantial improvements: on ALFWorld with Qwen3-8B, success rates jumped 10.9 percentage points with 29.1% gains in execution precision. BFCL results showed 5.7 percentage point success improvements and 29.5% precision gains. Critically, zero-shot transfer to Tau-bench and PopQA shows the learned invocation policy generalizes to new domains with previously unseen skills, suggesting the framework captures generalizable decision logic rather than memorizing skill patterns.
For the AI development community, this represents progress toward more reliable autonomous agents. The work signals that fine-grained control over agent behavior—deciding not just what to do but whether to act—becomes increasingly important at scale. This has implications for deployed agentic systems in real-world applications where unnecessary operations compound costs and introduce failure modes.
- →SelSkill introduces dual-granularity preference learning to improve selective skill invocation decisions in AI agents.
- →The framework achieves 10.9 percentage point improvement in task success on ALFWorld benchmarks with enhanced execution precision.
- →Zero-shot transfer to unseen domains suggests the learned policy generalizes beyond training environments.
- →The method addresses the overlooked problem of whether relevant skills should be invoked at specific decision points.
- →Episode-level and step-level preference combination enables agents to balance overall task quality with local invocation effectiveness.