#gradient-flow News & Analysis

6 articles tagged with #gradient-flow. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles

AIBullisharXiv – CS AI · Jun 57/10

🧠

SpanNorm: Reconciling Training Stability and Performance in Deep Transformers

Researchers introduce SpanNorm, a novel normalization technique for deep Transformer architectures that combines the training stability of PreNorm with the performance benefits of PostNorm. The method uses spanning residual connections and PostNorm-style computation to prevent gradient instability and representation collapse, demonstrating improvements in both dense and Mixture-of-Experts model configurations.

AINeutralarXiv – CS AI · May 127/10

🧠

Flag Varieties: A Geometric Framework for Deep Network Alignment

Researchers establish a unified geometric framework using flag varieties to explain alignment phenomena in deep neural networks, proving that subspace intersection dimension is the fundamental observable governing how weight matrices organize themselves. The work provides theoretical foundations for previously empirical observations about gradient flow, Neural Collapse, and representation similarity, with implications for understanding how neural networks learn.

AIBullisharXiv – CS AI · May 117/10

🧠

MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning

Researchers introduce MatryoshkaLoRA, a novel training framework that improves upon Low-Rank Adaptation (LoRA) for efficient large language model fine-tuning by learning hierarchical low-rank representations through a strategically placed diagonal scaling matrix. The method enables dynamic rank selection with minimal accuracy loss and introduces AURAC, a new evaluation metric for hierarchical adapters, addressing a key limitation in current parameter-efficient fine-tuning approaches.

AIBullisharXiv – CS AI · Mar 127/10

🧠

Gradient Flow Drifting: Generative Modeling via Wasserstein Gradient Flows of KDE-Approximated Divergences

Researchers introduce Gradient Flow Drifting, a new mathematical framework for generative AI models that connects the Drifting Model to Wasserstein gradient flows of KL divergence under kernel density estimation. The framework includes a mixed-divergence strategy to avoid mode collapse and extends to Riemannian manifolds for improved semantic space applications.

$KL

AINeutralarXiv – CS AI · Jun 56/10

🧠

Pretraining Recurrent Networks without Recurrence

Researchers propose Supervised Memory Training (SMT), a novel method for training recurrent neural networks that replaces sequential backpropagation through time with parallel, supervised learning on memory state transitions. By leveraging a Transformer encoder to generate training labels, SMT achieves stable gradient propagation and improved performance on language and sequence modeling tasks without the parallelism constraints of traditional RNN training.

AINeutralarXiv – CS AI · May 126/10

🧠

Fitting Multilinear Polynomials for Logic Gate Networks

Researchers propose a novel approach to training learnable logic gate networks by representing 2-input Boolean gates as multilinear polynomials in 4-dimensional space, reducing a vector-quantization problem from 16 to 4 parameters per neuron. The CovJac method outperforms the baseline Soft-Mix approach, particularly at network depth, by addressing gradient starvation issues that cause performance collapse in deeper architectures.