#loss-landscape News & Analysis

7 articles tagged with #loss-landscape. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles

AINeutralarXiv – CS AI · May 97/10

🧠

Are Flat Minima an Illusion?

A research paper challenges the prevailing assumption that flat minima in neural network loss landscapes improve generalization, arguing instead that 'weakness'—the volume of function-compatible parameter configurations—is the true driver of generalization. The author demonstrates that flatness is reparameterization-dependent and thus not causally responsible for better performance, while weakness remains invariant across different parameterizations.

AINeutralarXiv – CS AI · Jun 46/10

🧠

A Geometric Characterization of the Stationary Plateau for Two-Layer Neural Networks

Researchers characterize the geometric structure of loss landscape plateaus in two-layer neural networks, focusing on how duplicating hidden neurons creates affine sets of stationary points. The study classifies whether these plateau points are local minima or saddles based on an 'inner Hessian' matrix, revealing that splitting a minimum can produce mixed or all-saddle plateaus, while splitting saddles always yields saddle plateaus.

AINeutralarXiv – CS AI · May 296/10

🧠

Unveiling Multi-regime Patterns in SciML: Distinct Failure Modes and Regime-specific Optimization

Researchers identify a consistent three-regime structure in scientific machine learning (SciML) models, demonstrating that neural networks exhibit distinct failure modes and training behaviors depending on hyperparameter settings. The study reveals that optimization methods are regime-specific with no universal solution, providing a diagnostic framework to improve model robustness across physics-informed neural networks, neural operators, and neural ODEs.

AINeutralarXiv – CS AI · May 276/10

🧠

Model Merging on Loss Landscape: A Geometry Perspective

Researchers introduce EpiMer, a novel framework for merging machine learning models by treating it as a geometric optimization problem on Riemannian manifolds. The method uses low-rank task vectors and curvature information to improve knowledge integration without retraining, demonstrating superior performance when merging fine-tuned CLIP-ViT models across multiple image classification tasks.

AINeutralarXiv – CS AI · May 126/10

🧠

Optimizer-Induced Mode Connectivity: From AdamW to Muon

Researchers demonstrate that neural network solutions trained with specific optimizers like AdamW and Muon form connected sets at large network widths, revealing optimizer-dependent structure in loss landscapes. The study shows that different optimizers converge to disconnected solutions with provable loss barriers in small networks, while empirically in GPT-2 pretraining, same-optimizer paths preserve model spectra differently than cross-optimizer paths.

AINeutralarXiv – CS AI · May 116/10

🧠

The Effect of Mini-Batch Noise on the Implicit Bias of Adam

Researchers present a theoretical framework showing how mini-batch noise in Adam optimizer training affects the implicit bias toward sharper or flatter loss landscape regions, finding that optimal momentum hyperparameters shift based on batch size—small batches favor the default (0.9, 0.999) settings while larger batches benefit from closer β₁ and β₂ values.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Practical Bayesian Inference for Speech SNNs: Uncertainty and Loss-Landscape Smoothing

Researchers demonstrate that applying Bayesian inference to Spiking Neural Networks (SNNs) for speech processing smooths the irregular loss landscape caused by threshold-based spike generation. Testing on speech datasets shows improved performance metrics and more regular predictive landscapes compared to deterministic approaches.