AINeutralarXiv โ CS AI ยท 2d ago6/10
๐ง
No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning
Researchers introduce ECHO, a reinforcement learning framework that co-evolves policy and critic models to address the problem of stale feedback in LLM agent training. The system uses cascaded rollouts and saturation-aware gain shaping to maintain synchronized, relevant critique as the agent's behavior improves over time, demonstrating enhanced stability and success rates in complex environments.