🧠 AI🟢 BullishImportance 7/10

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

arXiv – CS AI|Wenbo Pan, Shujie Liu, Chin-Yew Lin, Jingying Zeng, Xianfeng Tang, Xiangyang Zhou, Yan Lu, Xiaohua Jia|June 5, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce Retrospective Harness Optimization (RHO), a self-supervised method that enables AI agents to improve their capabilities using only historical trajectory data without requiring external validation sets. The approach improved performance on SWE-Bench Pro from 59% to 78% pass rate in a single optimization round, demonstrating practical effectiveness across software engineering, technical work, and knowledge domains.

Analysis

RHO addresses a fundamental challenge in AI agent development: the scarcity of labeled data for continuous improvement in real-world deployments. Traditional optimization methods depend on ground-truth validation sets, creating a bottleneck when deploying agents in production environments where acquiring labeled data is expensive or infeasible. This research presents an autonomous feedback loop where agents leverage their own past performance to identify areas for improvement without external supervision.

The approach combines several self-directed mechanisms: identifying diverse, challenging tasks from historical trajectories, executing parallel rollouts, and employing self-validation to evaluate alternative harness configurations. The agent then selects improvements based on pairwise self-preference comparisons, creating a bootstrapping mechanism that strengthens performance iteratively. This methodology aligns with broader trends in autonomous AI systems that reduce dependency on human annotation and external grading infrastructure.

The empirical results carry significant implications for practical AI deployment. Achieving a 19-point improvement on SWE-Bench Pro without external validation demonstrates that self-preference mechanisms can effectively drive capability advancement. The sustained accuracy improvements during extended sessions indicate that optimized harnesses produce behavioral changes that persist beyond initial fine-tuning, suggesting genuine skill acquisition rather than narrow overfitting.

This work positions autonomous self-improvement as a viable path toward more capable AI agents in production. Future developments should explore whether RHO scales to more complex domains and whether multiple optimization cycles compound improvements. The methodology's success in software engineering contexts raises questions about applicability to other specialized domains requiring deep domain knowledge.

Key Takeaways

→RHO enables AI agents to self-improve using historical data without external validation or labeled datasets
→Single optimization round improved SWE-Bench Pro performance from 59% to 78% pass rate
→Self-consistency and self-preference mechanisms allow agents to autonomously select effective capability improvements
→Optimized harnesses demonstrate sustained performance gains across long-horizon tasks, not just isolated improvements
→Method works across diverse domains including software engineering, technical work, and knowledge-based tasks

#llm-agents #self-supervised-learning #ai-optimization #autonomous-improvement #harness-optimization #agent-capabilities #trajectory-rollouts #software-engineering-ai

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge