AIBullisharXiv – CS AI · 6h ago7/10
🧠
When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds
Researchers provide theoretical proof that sign-based optimization algorithms like SignSGD outperform standard SGD under specific conditions involving ℓ1-norm stationarity and sparse noise, with complexity improvements scaling by problem dimension d. The analysis bridges theory and practice by demonstrating these advantages during GPT-2 pretraining, explaining why sign-based methods succeed in large language model training despite lacking previous theoretical justification.