SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks
SearchSkill is a new framework that teaches language models to perform more effective web searches by explicitly planning queries through reusable skill cards rather than treating search as an undifferentiated action. The system maintains an evolving skill bank that improves from failure patterns, demonstrating better performance on knowledge-intensive QA tasks with fewer wasted queries and improved reasoning accuracy.
SearchSkill addresses a fundamental limitation in how current language models approach information retrieval: the tendency to issue broad, generic, or redundant search queries that waste computational resources and degrade downstream reasoning. Rather than treating search as a simple binary decision or generic action, the framework introduces explicit query planning through skill cards—reusable templates that condition both what to search for and how to formulate the query. This represents a meaningful shift in AI architecture design, moving from monolithic approaches to more modular, interpretable systems.
The innovation builds on growing recognition that LLM capabilities depend heavily on tool usage patterns. Open-domain question answering has consistently revealed that many failures stem not from reasoning deficits but from poor information retrieval decisions. SearchSkill's approach of maintaining a dynamic SkillBank that expands based on recurring failure patterns creates a learning loop that should improve over time and across different model sizes. The two-stage training recipe—selecting skills then executing skill-grounded actions—mirrors realistic inference constraints, improving generalization.
For developers and researchers, SearchSkill offers practical improvements: better retrieval efficiency within constrained budgets, fewer redundant queries, and more atomic, focused search steps. These benefits directly translate to lower inference costs and faster response times. The framework's compatibility with both open-source and closed-source models suggests broad applicability across the AI ecosystem.
The emphasis on explainability through skill selection also has downstream implications for AI trustworthiness and debugging, as researchers and practitioners can audit which skills models rely on and identify systematic query failures.
- →SearchSkill teaches LLMs to plan queries explicitly through reusable skill cards rather than issuing generic search commands
- →An evolving SkillBank improves from failure patterns and reconstructs affected trajectories for targeted training
- →The framework reduces wasted retrieval budget by producing more atomic, focused queries on knowledge-intensive QA tasks
- →Performance gains appear across both open-source and closed-source models with improved exact match scores
- →Explicit skill-conditioned planning offers better interpretability and debugging compared to undifferentiated search actions