🧠 AI⚪ NeutralImportance 5/10

When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic

arXiv – CS AI|Alberto Fern\'andez-Hern\'andez, Cristian P\'erez-Corral, Jose I. Mestre, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-Ort\'i|March 11, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce the Overfitting-Underfitting Indicator (OUI) to analyze learning rate sensitivity in PPO reinforcement learning systems. The metric can identify problematic learning rates early in training by measuring neural activation patterns, enabling more efficient hyperparameter screening without full training runs.

Key Takeaways

→OUI metric can discriminate between learning rate regimes using only 10% of training data across multiple environments.
→Critic networks achieve highest returns in intermediate OUI ranges while actor networks perform best with high OUI values.
→OUI-based screening outperforms traditional early screening methods for identifying promising training runs.
→The research provides theoretical connection between learning rates and neural activation sign changes.
→Combined OUI and early return criteria enable aggressive pruning of unpromising runs with high precision.