AINeutralarXiv โ CS AI ยท 4h ago0
๐ง
Optimizer-Induced Low-Dimensional Drift and Transverse Dynamics in Transformer Training
Researchers analyzed training trajectories in small transformer models, finding that parameter updates organize into a dominant drift direction with transverse dynamics. The study reveals that different optimizers (AdamW vs SGD) create substantially different trajectory geometries, with AdamW developing multi-dimensional structures while SGD produces more linear evolution.