Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills
Socratic-SWE introduces a self-evolving framework that improves LLM-driven software engineering agents by distilling their solving traces into structured skills that guide targeted task generation. The approach achieves 50.40% on SWE-bench Verified after three iterations, demonstrating that agent weaknesses can fuel scalable, execution-validated training data creation without manual intervention.
Socratic-SWE addresses a critical bottleneck in AI agent development: the scarcity of high-quality training data for software engineering tasks. Traditional synthetic data generation relies on fixed mutation rules disconnected from agent performance, leaving systems unable to adapt to their own learning gaps. This research demonstrates a closed-loop alternative where agents generate their own curriculum through failure analysis.
The framework's innovation lies in treating solving traces as rich training signals. Rather than extracting only reward signals, the system distills recurring failure patterns and repair strategies into reusable skills. These skills then guide generation of targeted repair tasks in real repositories, grounding synthetic data creation in authentic code contexts. Execution-based validation and solver-gradient alignment scoring ensure generated tasks remain both verifiable and pedagogically useful.
For the AI development community, this approach scales a fundamental challenge: how to continuously improve systems without exponentially increasing human annotation burden. The consistent improvements across multiple SWE benchmarks suggest the method generalizes beyond specific test suites. Reaching 50% on SWE-bench Verified represents meaningful progress on a challenging reasoning-intensive domain.
The implications extend beyond academic benchmarks. If self-evolving agents can reliably improve through trace-derived curriculum, companies deploying AI coding assistants could reduce reliance on manually curated training data. This could accelerate development of more capable software engineering agents while lowering annotation costs. Future research should explore whether this framework applies to other specialized domains beyond code repair.
- βSolving traces can be distilled into structured agent skills that guide targeted task generation for self-improvement.
- βExecution-based validation and solver-gradient alignment ensure synthetic tasks are both verifiable and pedagogically useful.
- βSocratic-SWE achieves 50.40% on SWE-bench Verified through iterative self-evolution without additional human annotation.
- βThe closed-loop framework adapts to agent weaknesses across training iterations, unlike fixed mutation-based synthetic data methods.
- βThis approach offers a scalable path to reduce dependence on human-annotated training data for AI coding agents.