y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#deep-learning-theory News & Analysis

7 articles tagged with #deep-learning-theory. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles
AINeutralarXiv – CS AI · May 127/10
🧠

Flag Varieties: A Geometric Framework for Deep Network Alignment

Researchers establish a unified geometric framework using flag varieties to explain alignment phenomena in deep neural networks, proving that subspace intersection dimension is the fundamental observable governing how weight matrices organize themselves. The work provides theoretical foundations for previously empirical observations about gradient flow, Neural Collapse, and representation similarity, with implications for understanding how neural networks learn.

AINeutralarXiv – CS AI · 15h ago6/10
🧠

Two Speeds of Learning: A Representation-Readout Decomposition of Grokking and Double Descent

Researchers propose a representation-readout decomposition framework that explains anomalous neural network training phenomena like grokking and double descent by analyzing two competing learning processes: representation learning in encoders and readout calibration in classifiers. The framework provides task-agnostic diagnostics that reveal these phenomena stem from fluctuations in relative learning speeds rather than mysterious delays, challenging existing lazy-to-rich learning theories.

AINeutralarXiv – CS AI · May 126/10
🧠

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking

Researchers empirically validate theoretical predictions about feature repulsion in neural network grokking, discovering that while the mathematical sign structure holds consistently across activation functions, the spectral signature of this mechanism in weight updates depends critically on activation type—appearing sharply in quadratic activations but remaining invisible in ReLU networks.

AINeutralarXiv – CS AI · May 116/10
🧠

Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer

Researchers develop a dynamical mean-field theory framework to analyze how neural network weight spectra evolve during training, revealing that different parameterization schemes (μP vs NTK) produce fundamentally different outlier dynamics. The findings suggest that neural scaling laws and hyperparameter transfer depend critically on how outlier eigenvalues behave, with implications for understanding deep learning generalization and optimization.

AINeutralarXiv – CS AI · May 96/10
🧠

It's Not a Lottery, It's a Race: Understanding How Gradient Descent Adapts the Network's Capacity to the Task

Researchers have identified three fundamental dynamical principles—mutual alignment, unlocking, and racing—that explain how gradient descent training reduces neural network capacity to match task requirements. This theoretical advancement clarifies the mechanisms behind the lottery ticket hypothesis and why certain initial neuron conditions lead to higher weight norms, bridging a significant gap between empirical neural network success and theoretical understanding.

AINeutralarXiv – CS AI · Apr 106/10
🧠

Sparse-Aware Neural Networks for Nonlinear Functionals: Mitigating the Exponential Dependence on Dimension

Researchers propose a sparse-aware neural network framework that combines convolutional architectures with fully connected networks to improve operator learning over infinite-dimensional function spaces. The approach significantly reduces the curse of dimensionality and sample complexity requirements for approximating nonlinear functionals, with improved theoretical guarantees for both deterministic and random sampling schemes.

AINeutralarXiv – CS AI · Apr 145/10
🧠

Wolkowicz-Styan Upper Bound on the Hessian Eigenspectrum for Cross-Entropy Loss in Nonlinear Smooth Neural Networks

Researchers derive a closed-form upper bound for the Hessian eigenspectrum of cross-entropy loss in smooth nonlinear neural networks using the Wolkowicz-Styan bound. This analytical approach avoids numerical computation and expresses loss sharpness as a function of network parameters, training sample orthogonality, and layer dimensions—advancing theoretical understanding of the relationship between loss geometry and generalization.