y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#benchmark-progress News & Analysis

1 article tagged with #benchmark-progress. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBullisharXiv – CS AI · 5h ago7/10
🧠

Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills

Socratic-SWE introduces a self-evolving framework that improves LLM-driven software engineering agents by distilling their solving traces into structured skills that guide targeted task generation. The approach achieves 50.40% on SWE-bench Verified after three iterations, demonstrating that agent weaknesses can fuel scalable, execution-validated training data creation without manual intervention.