Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses
Researchers introduce Harness-1, a 20B parameter search agent that separates semantic decision-making from state management by externalizing working memory to a stateful harness environment. The system achieves 73% average curated recall across eight retrieval benchmarks, outperforming comparable open-source searchers by 11.4 points while generalizing well to held-out transfer tasks.
Harness-1 addresses a fundamental architectural inefficiency in reinforcement learning-based search agents: forcing neural policies to simultaneously optimize search strategy and manage routine bookkeeping creates unnecessary cognitive load on the model. By moving working memory, candidate pools, evidence tracking, and context management to an external harness, the researchers allow the policy to focus exclusively on high-level semantic decisions like what to search for, which documents matter, and when to stop searching.
This separation of concerns builds on established principles in software engineering and cognitive science. Traditional search systems have long maintained explicit state structures, yet recent neural approaches conflate policy learning with state tracking. The Harness-1 approach represents a pragmatic hybrid: leveraging reinforcement learning for nuanced decision-making while letting deterministic algorithms handle reliable bookkeeping. The 20B parameter size remains modest compared to frontier models, yet achieves competitive performance through better architectural design rather than scale.
The benchmark results demonstrate meaningful improvements across heterogeneous domains—web search, financial documents, patents, and multi-hop question answering—with particularly strong gains on transfer benchmarks. This suggests the learned behaviors capture generalizable search principles rather than memorizing training domain artifacts. The open-source release enables community validation and potential integration into production systems.
Looking forward, this work questions the prevailing assumption that larger models necessarily solve complex agentic tasks. The generalization strength on held-out benchmarks hints that explicit state management could enable more sample-efficient and interpretable agent training, potentially influencing how search and reasoning systems are designed across both research and commercial applications.
- →Externalizing state management from neural policies improves search agent performance and generalization without requiring larger models.
- →Harness-1's 20B parameters achieve 73% curated recall, outperforming 11.4 points above comparable open-source search agents.
- →The architecture separates semantic decisions from bookkeeping, allowing reinforcement learning to optimize only high-level search strategy.
- →Strong transfer benchmark performance suggests learned behaviors generalize beyond training domains rather than overfitting.
- →Open-source release enables broader adoption and validates that architectural design can rival scale-based improvements.