#software-engineering-ai News & Analysis

3 articles tagged with #software-engineering-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles

AIBullisharXiv – CS AI · Jun 57/10

🧠

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Researchers introduce Retrospective Harness Optimization (RHO), a self-supervised method that enables AI agents to improve their capabilities using only historical trajectory data without requiring external validation sets. The approach improved performance on SWE-Bench Pro from 59% to 78% pass rate in a single optimization round, demonstrating practical effectiveness across software engineering, technical work, and knowledge domains.

AIBearisharXiv – CS AI · May 287/10

🧠

Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

Researchers introduce RAMP, a production-grounded assessment framework that reveals significant performance degradation in LLM agents under real-world conditions, with task completion rates collapsing from 100% to 20% across serial workflows. Testing 15 mainstream models shows that traditional benchmarks mask critical failures in long-horizon execution chains, while computational costs vary by three orders of magnitude between comparable models.

AIBullisharXiv – CS AI · May 127/10

🧠

Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents

Researchers present PROBE, a framework that improves how AI software engineering agents recover from failures by converting runtime telemetry into structured diagnoses and bounded recovery guidance. The system achieves 65% diagnosis accuracy and 21.8% recovery rates on previously unresolved cases, with a prototype deployed at Microsoft showing practical viability without disrupting existing workflows.