AINeutralarXiv – CS AI · 7h ago6/10
🧠
When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks?
Researchers introduce Prompted Policy Optimization (PromptPO), a method using large language models as black-box policy optimizers for reinforcement learning tasks. The approach demonstrates competitive or superior performance to traditional RL algorithms in exploration-heavy and robotics domains while requiring fewer environment interactions, though it underperforms in continuous control tasks like MuJoCo.