#self-evolution News & Analysis

18 articles tagged with #self-evolution. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

18 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

Darwin Mobile Agent: A Roadmap for Self-Evolution

Researchers introduce Darwin Mobile Agent, an open-source infrastructure enabling autonomous reinforcement learning agents to interact with mobile GUIs at scale. The framework addresses data collection bottlenecks through parallel cloud-phone instances and proposes a roadmap to remove human priors from AI agent design, advancing toward truly self-evolving autonomous systems.

AIBullisharXiv – CS AI · Jun 97/10

🧠

Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

Researchers introduce SkeMex, a self-evolving skill-based memory framework that enables medical AI agents to improve after deployment without retraining model weights. The system distills clinical interaction trajectories into reusable procedural skills, organized across multiple memory branches, and uses environment feedback to determine which experiences are genuinely useful for future decision-making.

AIBullisharXiv – CS AI · Jun 97/10

🧠

SIGA: Self-Evolving Coding-Agent Adapters for Scientific Simulation

Researchers introduce SIGA, an AI adapter system that enables general coding agents to operate specialized scientific simulators without extensive domain training. The system achieves a 36x speedup compared to human experts on GEOS multiphysics simulator configuration, demonstrating that lightweight grounding layers can make general AI tools practical for scientific software.

AIBullisharXiv – CS AI · Jun 97/10

🧠

INFUSER: Influence-Guided Self-Evolution Improves Reasoning

INFUSER is a novel self-evolution framework that enables language models to improve their reasoning capabilities through an iterative co-training process between a Generator and Solver, using an influence-aware scoring mechanism rather than difficulty heuristics. The method achieves 20% relative improvement on mathematical and coding benchmarks, demonstrating that adaptive curriculum learning can outperform larger frozen models.

AIBullisharXiv – CS AI · Jun 47/10

🧠

SePO: Self-Evolving Prompt Agent for System Prompt Optimization

Researchers propose Self-Evolving Prompt Optimization (SePO), a novel system that automatically optimizes AI agent prompts by treating the prompt agent's own instructions as an optimization target. The method demonstrates consistent performance gains across five diverse benchmarks, outperforming existing approaches and showing generalization to unseen tasks.

AIBullisharXiv – CS AI · May 287/10

🧠

Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback

Researchers propose COSE, a self-evolution framework for large language models that uses confidence signals to filter noisy self-generated training feedback without external verifiers. The method demonstrates consistent improvements across 19 benchmarks and multiple model sizes (0.6B–4B parameters), achieving state-of-the-art performance in reasoning and mathematics tasks.

🧠 Llama

AIBullisharXiv – CS AI · May 127/10

🧠

G-Zero: Self-Play for Open-Ended Generation from Zero Data

Researchers introduce G-Zero, a verifier-free framework that enables large language models to improve autonomously through self-play without relying on external judges or proxy models. The approach uses an intrinsic reward mechanism called Hint-δ to identify and address the Generator model's blind spots, achieving scalable self-evolution across unverifiable domains.

AIBullisharXiv – CS AI · May 97/10

🧠

SkillOS: Learning Skill Curation for Self-Evolving Agents

Researchers introduce SkillOS, a reinforcement learning framework that enables LLM-based agents to autonomously curate and evolve reusable skills from experience rather than relying on manual intervention. The system pairs a frozen agent executor with a trainable skill curator that manages an external skill repository, demonstrating consistent improvements in effectiveness and efficiency across multi-turn and single-turn tasks while generalizing across different agent architectures.

AIBullisharXiv – CS AI · Apr 77/10

🧠

Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling

Researchers propose Continuous Softened Retracing reSampling (CSRS) to improve the self-evolution of Multimodal Large Language Models by addressing biases in feedback mechanisms. The method uses continuous reward signals instead of binary rewards and achieves state-of-the-art results on mathematical reasoning benchmarks like MathVision using Qwen2.5-VL-7B.

AIBullisharXiv – CS AI · Mar 177/10

🧠

SAGE: Multi-Agent Self-Evolution for LLM Reasoning

Researchers introduced SAGE, a multi-agent framework that improves large language model reasoning through self-evolution using four specialized agents. The system achieved significant performance gains on coding and mathematics benchmarks without requiring large human-labeled datasets.

AIBullisharXiv – CS AI · Mar 47/103

🧠

Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

Researchers propose a framework for sustainable AI self-evolution through triadic roles (Proposer, Solver, Verifier) that ensures learnable information gain across iterations. The study identifies three key system designs to prevent the common plateau effect in self-play AI systems: asymmetric co-evolution, capacity growth, and proactive information seeking.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

Researchers at arXiv present findings that challenge assumptions about LLM agent capabilities, revealing that a model's base performance doesn't predict its ability to self-evolve through harness updates. The study identifies two distinct capabilities—harness-updating and harness-benefit—with counterintuitive results suggesting mid-tier models benefit most from self-evolution while strong models benefit less.

🧠 Claude

AINeutralarXiv – CS AI · May 296/10

🧠

BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

Researchers introduce BenchTrace, a benchmark framework for evaluating how well large language model agents learn from failures through reflection and self-evolution. Testing on Qwen3-32B and GPT-4.1 reveals significant limitations: both models achieve below 30% accuracy on reflection tasks, struggle with diagnosis, and experience performance degradation as noise accumulates in their learning processes.

🧠 GPT-4

AINeutralarXiv – CS AI · May 296/10

🧠

PTCG-Bench: Can LLM Agents Master Pok\'emon Trading Card Game?

Researchers introduce PTCG-Bench, a benchmark using the Pokémon Trading Card Game to evaluate how well large language model agents can master complex strategic games and improve through self-experience. The study reveals that while LLM agents demonstrate competent gameplay, they struggle with sustained self-evolution and are heavily influenced by system design choices.

AINeutralarXiv – CS AI · May 126/10

🧠

Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation

Researchers identify capability erosion in self-evolving LLM agents, where systems adapting to new tasks progressively lose previously learned abilities across workflow, skill, model, and memory dimensions. The study proposes Capability-Preserving Evolution (CPE), a stabilization framework that maintains performance on existing tasks while enabling new adaptations, demonstrating improvements in retained capability stability across all evolution channels.

🧠 GPT-5

AINeutralarXiv – CS AI · May 126/10

🧠

MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs

MAGE introduces a novel framework for self-evolving language model agents that uses co-evolutionary knowledge graphs to preserve learned knowledge across iterations without modifying the base model. The system externalizes learning into structured memory subgraphs, enabling frozen backbone models to improve through retrieved guidance while maintaining inference stability across nine diverse benchmarks.

AIBullisharXiv – CS AI · May 126/10

🧠

EmbodiSkill: Skill-Aware Reflection for Self-Evolving Embodied Agents

EmbodiSkill introduces a training-free framework enabling embodied AI agents to autonomously improve their skills through reflection on task execution trajectories. By distinguishing between skill deficiencies and execution lapses, the system allows frozen language models to achieve significantly higher task success rates, with a Qwen 3.5-27B model reaching 93.28% success on ALFWorld benchmarks.

🧠 GPT-5

AINeutralarXiv – CS AI · Mar 164/10

🧠

Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis and Dual-Track Knowledge Distillation

Researchers introduce Steve-Evolving, a new AI framework for open-world embodied agents that uses fine-grained diagnosis and knowledge distillation to improve long-horizon task performance. The system organizes interaction experiences into structured tuples and continuously evolves without model parameter updates, showing improvements in Minecraft testing environments.