AINeutralarXiv โ CS AI ยท 4d ago4/104
๐ง
Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?
Researchers analyzed scaling laws for signSGD optimization in machine learning, comparing it to standard SGD under a power-law random features model. The study identifies unique effects in signSGD that can lead to steeper compute-optimal scaling laws than SGD in noise-dominant regimes.