AINeutralarXiv – CS AI · Apr 156/10
🧠
No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning
Researchers introduce ECHO, a reinforcement learning framework that co-evolves policy and critic models to address the problem of stale feedback in LLM agent training. The system uses cascaded rollouts and saturation-aware gain shaping to maintain synchronized, relevant critique as the agent's behavior improves over time, demonstrating enhanced stability and success rates in complex environments.