AINeutralarXiv – CS AI · 15h ago6/10
🧠
Not All Transitions Matter: Evidence from PPO
Researchers propose a simple technique for stabilizing reinforcement learning training in PPO algorithms by randomly dropping 25% of transitions during rollouts. The method removes gradient redundancy caused by causally-dependent state sequences, improving training consistency across multiple environments without algorithmic modifications.