AINeutralarXiv – CS AI · 9h ago6/10
🧠
Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway
Researchers demonstrate that discrete Gradient Descent with large step sizes produces fundamentally different training dynamics in deep linear networks compared to continuous Gradient Flow. Their analysis reveals that multi-pathway networks redistribute signals across pathways during later training stages rather than concentrating them in single pathways, challenging prevailing theoretical predictions and suggesting that optimization step size significantly influences neural network representation learning.