Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts
Researchers introduce Retrospective Harness Optimization (RHO), a self-supervised method that enables AI agents to improve their capabilities using only historical trajectory data without requiring external validation sets. The approach improved performance on SWE-Bench Pro from 59% to 78% pass rate in a single optimization round, demonstrating practical effectiveness across software engineering, technical work, and knowledge domains.
RHO addresses a fundamental challenge in AI agent development: the scarcity of labeled data for continuous improvement in real-world deployments. Traditional optimization methods depend on ground-truth validation sets, creating a bottleneck when deploying agents in production environments where acquiring labeled data is expensive or infeasible. This research presents an autonomous feedback loop where agents leverage their own past performance to identify areas for improvement without external supervision.
The approach combines several self-directed mechanisms: identifying diverse, challenging tasks from historical trajectories, executing parallel rollouts, and employing self-validation to evaluate alternative harness configurations. The agent then selects improvements based on pairwise self-preference comparisons, creating a bootstrapping mechanism that strengthens performance iteratively. This methodology aligns with broader trends in autonomous AI systems that reduce dependency on human annotation and external grading infrastructure.
The empirical results carry significant implications for practical AI deployment. Achieving a 19-point improvement on SWE-Bench Pro without external validation demonstrates that self-preference mechanisms can effectively drive capability advancement. The sustained accuracy improvements during extended sessions indicate that optimized harnesses produce behavioral changes that persist beyond initial fine-tuning, suggesting genuine skill acquisition rather than narrow overfitting.
This work positions autonomous self-improvement as a viable path toward more capable AI agents in production. Future developments should explore whether RHO scales to more complex domains and whether multiple optimization cycles compound improvements. The methodology's success in software engineering contexts raises questions about applicability to other specialized domains requiring deep domain knowledge.
- βRHO enables AI agents to self-improve using historical data without external validation or labeled datasets
- βSingle optimization round improved SWE-Bench Pro performance from 59% to 78% pass rate
- βSelf-consistency and self-preference mechanisms allow agents to autonomously select effective capability improvements
- βOptimized harnesses demonstrate sustained performance gains across long-horizon tasks, not just isolated improvements
- βMethod works across diverse domains including software engineering, technical work, and knowledge-based tasks