🧠 AI🟢 BullishImportance 7/10

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

arXiv – CS AI|Zequn Xie, Junjie Wang, Dan Yang, Jie Feng, Yue Shen, Jian Wang, Jinjie Gu|June 8, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce SlimSearcher, a framework that trains AI web agents to perform complex information-seeking tasks with 17-58% fewer tool calls while maintaining or improving accuracy. The approach combines efficient trajectory filtering during supervised fine-tuning with adaptive reward gating during reinforcement learning to eliminate wasteful search behaviors.

Analysis

SlimSearcher addresses a critical inefficiency in current deep research agents that prioritize accuracy over computational cost, resulting in unnecessarily long reasoning chains and excessive API calls. The framework operates on the principle that optimal agent behavior should balance correctness with resource consumption, reflecting real-world constraints where computational efficiency directly impacts operational costs and latency. The two-stage approach first filters training data to identify successful yet economical trajectories during SFT, establishing efficiency awareness early in model development. During RL training, adaptive reward gating dynamically evaluates tool and token usage relative to peer performance rather than applying fixed penalties, preventing the brevity bias that often emerges when models are penalized for longer sequences. This relative evaluation mechanism proves more sophisticated than absolute cost penalties, as it accounts for task complexity while avoiding reward hacking where models learn to game metrics rather than genuinely improve efficiency. The testing across GAIA, BrowseComp, and XBenchDeepSearch demonstrates consistent improvements, suggesting the approach generalizes across diverse benchmarks. For AI infrastructure providers and organizations deploying agentic systems at scale, reduced tool-call frequency translates directly to lower API costs, faster response times, and improved user experience. The work reflects broader industry recognition that frontier model capabilities matter less than practical deployment efficiency.

Key Takeaways

→SlimSearcher reduces average tool-call rounds by 17-58% while maintaining or improving accuracy on benchmark tasks
→Adaptive reward gating evaluates efficiency relative to peer performance rather than applying absolute penalties to avoid brevity bias
→Pareto-efficient trajectory filtering during training distills both successful and economical search behaviors
→The framework addresses computational cost as a primary optimization target alongside correctness
→Testing on long-horizon benchmarks demonstrates consistent generalization across diverse information-seeking tasks