y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#loss-landscape News & Analysis

5 articles tagged with #loss-landscape. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles
AINeutralarXiv – CS AI · May 97/10
🧠

Are Flat Minima an Illusion?

A research paper challenges the prevailing assumption that flat minima in neural network loss landscapes improve generalization, arguing instead that 'weakness'—the volume of function-compatible parameter configurations—is the true driver of generalization. The author demonstrates that flatness is reparameterization-dependent and thus not causally responsible for better performance, while weakness remains invariant across different parameterizations.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Model Merging on Loss Landscape: A Geometry Perspective

Researchers introduce EpiMer, a novel framework for merging machine learning models by treating it as a geometric optimization problem on Riemannian manifolds. The method uses low-rank task vectors and curvature information to improve knowledge integration without retraining, demonstrating superior performance when merging fine-tuned CLIP-ViT models across multiple image classification tasks.

AINeutralarXiv – CS AI · May 126/10
🧠

Optimizer-Induced Mode Connectivity: From AdamW to Muon

Researchers demonstrate that neural network solutions trained with specific optimizers like AdamW and Muon form connected sets at large network widths, revealing optimizer-dependent structure in loss landscapes. The study shows that different optimizers converge to disconnected solutions with provable loss barriers in small networks, while empirically in GPT-2 pretraining, same-optimizer paths preserve model spectra differently than cross-optimizer paths.

AINeutralarXiv – CS AI · May 116/10
🧠

The Effect of Mini-Batch Noise on the Implicit Bias of Adam

Researchers present a theoretical framework showing how mini-batch noise in Adam optimizer training affects the implicit bias toward sharper or flatter loss landscape regions, finding that optimal momentum hyperparameters shift based on batch size—small batches favor the default (0.9, 0.999) settings while larger batches benefit from closer β₁ and β₂ values.

AINeutralarXiv – CS AI · Apr 136/10
🧠

Practical Bayesian Inference for Speech SNNs: Uncertainty and Loss-Landscape Smoothing

Researchers demonstrate that applying Bayesian inference to Spiking Neural Networks (SNNs) for speech processing smooths the irregular loss landscape caused by threshold-based spike generation. Testing on speech datasets shows improved performance metrics and more regular predictive landscapes compared to deterministic approaches.