AIBullishOpenAI News · Jul 46/105
🧠
Learning Montezuma’s Revenge from a single demonstration
OpenAI researchers achieved a breakthrough score of 74,500 on Montezuma's Revenge using reinforcement learning from just a single human demonstration. The algorithm trains agents starting from strategically selected states and optimizes using PPO, the same technique behind OpenAI Five.