AINeutralarXiv – CS AI · 8h ago6/10
🧠
The Effect of Mini-Batch Noise on the Implicit Bias of Adam
Researchers present a theoretical framework showing how mini-batch noise in Adam optimizer training affects the implicit bias toward sharper or flatter loss landscape regions, finding that optimal momentum hyperparameters shift based on batch size—small batches favor the default (0.9, 0.999) settings while larger batches benefit from closer β₁ and β₂ values.