🧠 AI🟢 BullishImportance 7/10

Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills

arXiv – CS AI|Chuan Xiao, Zhengbo Jiao, Shaobo Wang, Wei Wang, Bing Zhao, Hu Wei, Linfeng Zhang, Lin Qu|June 8, 2026 at 04:00 AM

🤖AI Summary

Socratic-SWE introduces a self-evolving framework that improves LLM-driven software engineering agents by distilling their solving traces into structured skills that guide targeted task generation. The approach achieves 50.40% on SWE-bench Verified after three iterations, demonstrating that agent weaknesses can fuel scalable, execution-validated training data creation without manual intervention.

Analysis

Socratic-SWE addresses a critical bottleneck in AI agent development: the scarcity of high-quality training data for software engineering tasks. Traditional synthetic data generation relies on fixed mutation rules disconnected from agent performance, leaving systems unable to adapt to their own learning gaps. This research demonstrates a closed-loop alternative where agents generate their own curriculum through failure analysis.

The framework's innovation lies in treating solving traces as rich training signals. Rather than extracting only reward signals, the system distills recurring failure patterns and repair strategies into reusable skills. These skills then guide generation of targeted repair tasks in real repositories, grounding synthetic data creation in authentic code contexts. Execution-based validation and solver-gradient alignment scoring ensure generated tasks remain both verifiable and pedagogically useful.

For the AI development community, this approach scales a fundamental challenge: how to continuously improve systems without exponentially increasing human annotation burden. The consistent improvements across multiple SWE benchmarks suggest the method generalizes beyond specific test suites. Reaching 50% on SWE-bench Verified represents meaningful progress on a challenging reasoning-intensive domain.

The implications extend beyond academic benchmarks. If self-evolving agents can reliably improve through trace-derived curriculum, companies deploying AI coding assistants could reduce reliance on manually curated training data. This could accelerate development of more capable software engineering agents while lowering annotation costs. Future research should explore whether this framework applies to other specialized domains beyond code repair.

Key Takeaways

→Solving traces can be distilled into structured agent skills that guide targeted task generation for self-improvement.
→Execution-based validation and solver-gradient alignment ensure synthetic tasks are both verifiable and pedagogically useful.
→Socratic-SWE achieves 50.40% on SWE-bench Verified through iterative self-evolution without additional human annotation.
→The closed-loop framework adapts to agent weaknesses across training iterations, unlike fixed mutation-based synthetic data methods.
→This approach offers a scalable path to reduce dependence on human-annotated training data for AI coding agents.