AINeutralarXiv – CS AI · 9h ago6/10
🧠
Gradient descent at the Edge of Stability: free energy model and kinetic description of the two-layer network
Researchers propose a continuous-time mathematical model for analyzing gradient descent dynamics in the Edge of Stability regime, where large learning rates cause oscillations in neural network training. The model introduces an effective free energy framework that combines risk with a curvature-related term, enabling better prediction of training dynamics in wide two-layer networks and validated on matrix factorization and CIFAR-10 tasks.