AINeutralarXiv – CS AI · 14h ago6/10
🧠
On the Optimizer Dependence of Neural Scaling Laws
Researchers demonstrate that the scaling exponent in neural scaling laws varies systematically based on optimizer choice, with preconditioned optimizers achieving 2.6x larger exponents than standard gradient descent in controlled experiments. The findings suggest scaling-law forecasts must account for optimizer selection, though the practical impact on large-scale LLM training remains uncertain.