🧠 AI🟢 BullishImportance 7/10

Moonwalk: Inverse-Forward Differentiation

arXiv – CS AI|Dmitrii Krylov, Armin Karamzade, Roy Fox|March 26, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce Moonwalk, a new algorithm that solves backpropagation's memory limitations by eliminating the need to store intermediate activations during neural network training. The method uses vector-inverse-Jacobian products and submersive networks to reconstruct gradients in a forward sweep, enabling training of networks more than twice as deep under the same memory constraints.

Key Takeaways

→Moonwalk eliminates backpropagation's memory bottleneck by avoiding storage of intermediate activations during forward pass.
→The method introduces submersive networks where gradients can be reconstructed exactly without storing activations.
→Vector-inverse-Jacobian products enable gradient flow inversion outside the cokernel of layer Jacobians.
→Fragmental gradient checkpointing records only minimal residuals needed for non-submersive layers.
→Implementation matches backpropagation runtime while training networks over twice as deep under same memory budget.