🧠 AI⚪ NeutralImportance 7/10

The Hamilton-Jacobi Theory of Deep Learning

arXiv – CS AI|Jose Marie Antonio Mi\~noza, Erika Fille T. Legara, Christopher P. Monterola|May 29, 2026 at 04:00 AM

🤖AI Summary

Researchers establish a mathematical framework connecting neural network training to Hamilton-Jacobi partial differential equations, showing that gradient descent searches through solutions to viscous PDEs. This theoretical unification applies across major architectures including residual networks and transformers, with implications for understanding generalization, adversarial robustness, and interpretability.

Analysis

This paper bridges deep learning and classical mathematical physics by proving that neural network training—traditionally viewed through optimization and statistical learning—is mathematically equivalent to solving Hamilton-Jacobi initial-value problems. The correspondence holds exactly for log-sum-exp layers and structurally for residual networks, transformers, and recurrent architectures, establishing that each architecture discretizes the same family of viscous PDEs with architecture-specific Hamiltonians.

The theoretical framework unifies four perspectives: network computation, tropical algebra, PDE dynamics, and convex optimization through a single deformation parameter. This mathematical formalism enables exact characterization of previously mysterious phenomena. The minimax optimal generalization rate O(n^{-1/(d+2)}) emerges directly from PDE theory rather than requiring statistical learning arguments. Adversarial robustness becomes a tunable property controlled by the viscosity parameter, offering mechanistic insight into why certain architectures prove more robust than others.

The derivation of backpropagation as the co-state equation of a Hamiltonian system validates decades of intuition about adjoint methods while grounding them in optimal control theory. The closed-form influence function with softmax attribution weights provides computationally efficient O(N) interpretability without approximation, addressing a critical gap in neural network transparency.

These theoretical advances matter because they replace black-box intuitions with rigorous mathematical structures. Understanding neural networks as PDE discretizations enables better architecture design, improved training algorithms informed by numerical analysis of PDEs, and principled approaches to robustness and interpretability. The framework suggests new research directions in applying classical PDE analysis to contemporary deep learning problems.

Key Takeaways

→Neural network training mathematically equals solving Hamilton-Jacobi viscous PDEs, unifying optimization, algebra, and classical physics
→The framework yields exact minimax generalization bounds and closed-form influence functions for neural network interpretability
→Adversarial robustness emerges as a tunable PDE property controlled by viscosity, offering mechanistic design principles
→Backpropagation corresponds exactly to Hamiltonian co-state equations, validating adjoint methods through optimal control theory
→The theory applies across residual networks, transformers, and RNNs/LSTMs, establishing universal discretization principles