#deep-learning-theory News & Analysis

14 articles tagged with #deep-learning-theory. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

14 articles

AINeutralarXiv – CS AI · Jun 257/10

🧠

Learning Non-Vacuous Generalization Bounds from Optimization

Researchers have developed a non-vacuous generalization bound for deep neural networks by analyzing stochastic gradient descent through the lens of fractional Brownian motion, demonstrating theoretical guarantees on networks like ResNet and Vision Transformer trained on ImageNet-1K. This addresses a long-standing gap between theoretical bounds and practical neural network performance.

AINeutralarXiv – CS AI · Jun 237/10

🧠

All Routes Lead to Collapse

Researchers demonstrate that attention sinks, representation collapse, and norm stratification—previously thought to be transformer-specific problems—are universal behaviors of content-based routing systems with mismatched metrics. The study reveals this collapse pattern occurs across diverse architectures including softmax attention, graph attention, state-space models, and recurrent mixers, suggesting the issue stems from fundamental routing mechanics rather than transformer design.

AINeutralarXiv – CS AI · May 297/10

🧠

The Hamilton-Jacobi Theory of Deep Learning

Researchers establish a mathematical framework connecting neural network training to Hamilton-Jacobi partial differential equations, showing that gradient descent searches through solutions to viscous PDEs. This theoretical unification applies across major architectures including residual networks and transformers, with implications for understanding generalization, adversarial robustness, and interpretability.

AINeutralarXiv – CS AI · May 127/10

🧠

Flag Varieties: A Geometric Framework for Deep Network Alignment

Researchers establish a unified geometric framework using flag varieties to explain alignment phenomena in deep neural networks, proving that subspace intersection dimension is the fundamental observable governing how weight matrices organize themselves. The work provides theoretical foundations for previously empirical observations about gradient flow, Neural Collapse, and representation similarity, with implications for understanding how neural networks learn.

AINeutralarXiv – CS AI · Jun 96/10

🧠

How Deep Are Deep GPs, Really? A Sharp Threshold and a Non-Gaussian Limit for Compositional GPs

Researchers establish a sharp bandwidth threshold for deep Gaussian processes, proving that below this threshold compositional GPs converge to non-Gaussian, non-degenerate limit distributions rather than degenerating to constant functions. This advances theoretical understanding of deep Bayesian models and their limiting behavior as network depth increases.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction

Researchers formalize the grokking phenomenon—where neural networks fit training data quickly but learn generalizable rules slowly—by analyzing deep linear networks and ReLU MLPs. The study identifies two distinct training timescales: fast classification loss decay and slower representation simplification, with implications for understanding how neural networks generalize.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime

Researchers present a theoretical framework analyzing scaling laws for shallow neural networks in the feature learning regime, deriving phase diagrams that connect sample complexity and weight decay to risk exponents. The work bridges empirical observations in deep learning with rigorous mathematical analysis, establishing links between weight spectrum properties and generalization performance through matrix compressed sensing and LASSO theory.

AINeutralarXiv – CS AI · Jun 46/10

🧠

From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments

Researchers present a theoretical framework for deep reinforcement learning in continuous environments using continuous-time stochastic processes and stochastic control theory. The work establishes a two time-scale model for actor-critic algorithms with neural networks, deriving equations that describe how state distributions evolve during training in the infinite width limit.

AINeutralarXiv – CS AI · May 276/10

🧠

Two Speeds of Learning: A Representation-Readout Decomposition of Grokking and Double Descent

Researchers propose a representation-readout decomposition framework that explains anomalous neural network training phenomena like grokking and double descent by analyzing two competing learning processes: representation learning in encoders and readout calibration in classifiers. The framework provides task-agnostic diagnostics that reveal these phenomena stem from fluctuations in relative learning speeds rather than mysterious delays, challenging existing lazy-to-rich learning theories.

AINeutralarXiv – CS AI · May 126/10

🧠

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking

Researchers empirically validate theoretical predictions about feature repulsion in neural network grokking, discovering that while the mathematical sign structure holds consistently across activation functions, the spectral signature of this mechanism in weight updates depends critically on activation type—appearing sharply in quadratic activations but remaining invisible in ReLU networks.

AINeutralarXiv – CS AI · May 116/10

🧠

Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer

Researchers develop a dynamical mean-field theory framework to analyze how neural network weight spectra evolve during training, revealing that different parameterization schemes (μP vs NTK) produce fundamentally different outlier dynamics. The findings suggest that neural scaling laws and hyperparameter transfer depend critically on how outlier eigenvalues behave, with implications for understanding deep learning generalization and optimization.

AINeutralarXiv – CS AI · May 96/10

🧠

It's Not a Lottery, It's a Race: Understanding How Gradient Descent Adapts the Network's Capacity to the Task

Researchers have identified three fundamental dynamical principles—mutual alignment, unlocking, and racing—that explain how gradient descent training reduces neural network capacity to match task requirements. This theoretical advancement clarifies the mechanisms behind the lottery ticket hypothesis and why certain initial neuron conditions lead to higher weight norms, bridging a significant gap between empirical neural network success and theoretical understanding.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Sparse-Aware Neural Networks for Nonlinear Functionals: Mitigating the Exponential Dependence on Dimension

Researchers propose a sparse-aware neural network framework that combines convolutional architectures with fully connected networks to improve operator learning over infinite-dimensional function spaces. The approach significantly reduces the curse of dimensionality and sample complexity requirements for approximating nonlinear functionals, with improved theoretical guarantees for both deterministic and random sampling schemes.

AINeutralarXiv – CS AI · Apr 145/10

🧠

Wolkowicz-Styan Upper Bound on the Hessian Eigenspectrum for Cross-Entropy Loss in Nonlinear Smooth Neural Networks

Researchers derive a closed-form upper bound for the Hessian eigenspectrum of cross-entropy loss in smooth nonlinear neural networks using the Wolkowicz-Styan bound. This analytical approach avoids numerical computation and expresses loss sharpness as a function of network parameters, training sample orthogonality, and layer dimensions—advancing theoretical understanding of the relationship between loss geometry and generalization.