AINeutralarXiv โ CS AI ยท 8h ago6/10
๐ง
EXPO: Stable Reinforcement Learning with Expressive Policies
Researchers introduce EXPO, a reinforcement learning algorithm that trains expressive policies (like diffusion models) more efficiently by avoiding direct value optimization. The method uses a lightweight Gaussian policy to edit actions from a base policy, achieving 2-3x improvements in sample efficiency for both offline-to-online and fine-tuning scenarios.