y0news
#optimizer-analysis1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 4h ago0
๐Ÿง 

Optimizer-Induced Low-Dimensional Drift and Transverse Dynamics in Transformer Training

Researchers analyzed training trajectories in small transformer models, finding that parameter updates organize into a dominant drift direction with transverse dynamics. The study reveals that different optimizers (AdamW vs SGD) create substantially different trajectory geometries, with AdamW developing multi-dimensional structures while SGD produces more linear evolution.