y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#self-evolution News & Analysis

13 articles tagged with #self-evolution. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

13 articles
AIBullisharXiv – CS AI · May 287/10
🧠

Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback

Researchers propose COSE, a self-evolution framework for large language models that uses confidence signals to filter noisy self-generated training feedback without external verifiers. The method demonstrates consistent improvements across 19 benchmarks and multiple model sizes (0.6B–4B parameters), achieving state-of-the-art performance in reasoning and mathematics tasks.

🧠 Llama
AIBullisharXiv – CS AI · May 127/10
🧠

G-Zero: Self-Play for Open-Ended Generation from Zero Data

Researchers introduce G-Zero, a verifier-free framework that enables large language models to improve autonomously through self-play without relying on external judges or proxy models. The approach uses an intrinsic reward mechanism called Hint-δ to identify and address the Generator model's blind spots, achieving scalable self-evolution across unverifiable domains.

AIBullisharXiv – CS AI · May 97/10
🧠

SkillOS: Learning Skill Curation for Self-Evolving Agents

Researchers introduce SkillOS, a reinforcement learning framework that enables LLM-based agents to autonomously curate and evolve reusable skills from experience rather than relying on manual intervention. The system pairs a frozen agent executor with a trainable skill curator that manages an external skill repository, demonstrating consistent improvements in effectiveness and efficiency across multi-turn and single-turn tasks while generalizing across different agent architectures.

AIBullisharXiv – CS AI · Apr 77/10
🧠

Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling

Researchers propose Continuous Softened Retracing reSampling (CSRS) to improve the self-evolution of Multimodal Large Language Models by addressing biases in feedback mechanisms. The method uses continuous reward signals instead of binary rewards and achieves state-of-the-art results on mathematical reasoning benchmarks like MathVision using Qwen2.5-VL-7B.

AIBullisharXiv – CS AI · Mar 177/10
🧠

SAGE: Multi-Agent Self-Evolution for LLM Reasoning

Researchers introduced SAGE, a multi-agent framework that improves large language model reasoning through self-evolution using four specialized agents. The system achieved significant performance gains on coding and mathematics benchmarks without requiring large human-labeled datasets.

AIBullisharXiv – CS AI · Mar 47/103
🧠

Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

Researchers propose a framework for sustainable AI self-evolution through triadic roles (Proposer, Solver, Verifier) that ensures learnable information gain across iterations. The study identifies three key system designs to prevent the common plateau effect in self-play AI systems: asymmetric co-evolution, capacity growth, and proactive information seeking.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

Researchers at arXiv present findings that challenge assumptions about LLM agent capabilities, revealing that a model's base performance doesn't predict its ability to self-evolve through harness updates. The study identifies two distinct capabilities—harness-updating and harness-benefit—with counterintuitive results suggesting mid-tier models benefit most from self-evolution while strong models benefit less.

🧠 Claude
AINeutralarXiv – CS AI · May 296/10
🧠

BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

Researchers introduce BenchTrace, a benchmark framework for evaluating how well large language model agents learn from failures through reflection and self-evolution. Testing on Qwen3-32B and GPT-4.1 reveals significant limitations: both models achieve below 30% accuracy on reflection tasks, struggle with diagnosis, and experience performance degradation as noise accumulates in their learning processes.

🧠 GPT-4
AINeutralarXiv – CS AI · May 296/10
🧠

PTCG-Bench: Can LLM Agents Master Pok\'emon Trading Card Game?

Researchers introduce PTCG-Bench, a benchmark using the Pokémon Trading Card Game to evaluate how well large language model agents can master complex strategic games and improve through self-experience. The study reveals that while LLM agents demonstrate competent gameplay, they struggle with sustained self-evolution and are heavily influenced by system design choices.

AINeutralarXiv – CS AI · May 126/10
🧠

Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation

Researchers identify capability erosion in self-evolving LLM agents, where systems adapting to new tasks progressively lose previously learned abilities across workflow, skill, model, and memory dimensions. The study proposes Capability-Preserving Evolution (CPE), a stabilization framework that maintains performance on existing tasks while enabling new adaptations, demonstrating improvements in retained capability stability across all evolution channels.

🧠 GPT-5
AINeutralarXiv – CS AI · May 126/10
🧠

MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs

MAGE introduces a novel framework for self-evolving language model agents that uses co-evolutionary knowledge graphs to preserve learned knowledge across iterations without modifying the base model. The system externalizes learning into structured memory subgraphs, enabling frozen backbone models to improve through retrieved guidance while maintaining inference stability across nine diverse benchmarks.

AIBullisharXiv – CS AI · May 126/10
🧠

EmbodiSkill: Skill-Aware Reflection for Self-Evolving Embodied Agents

EmbodiSkill introduces a training-free framework enabling embodied AI agents to autonomously improve their skills through reflection on task execution trajectories. By distinguishing between skill deficiencies and execution lapses, the system allows frozen language models to achieve significantly higher task success rates, with a Qwen 3.5-27B model reaching 93.28% success on ALFWorld benchmarks.

🧠 GPT-5
AINeutralarXiv – CS AI · Mar 164/10
🧠

Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis and Dual-Track Knowledge Distillation

Researchers introduce Steve-Evolving, a new AI framework for open-world embodied agents that uses fine-grained diagnosis and knowledge distillation to improve long-horizon task performance. The system organizes interaction experiences into structured tuples and continuously evolves without model parameter updates, showing improvements in Minecraft testing environments.