y0news
← Feed
Back to feed
🧠 AI Neutral

Optimizer-Induced Low-Dimensional Drift and Transverse Dynamics in Transformer Training

arXiv – CS AI|Yongzhong Xu||1 views
🤖AI Summary

Researchers analyzed training trajectories in small transformer models, finding that parameter updates organize into a dominant drift direction with transverse dynamics. The study reveals that different optimizers (AdamW vs SGD) create substantially different trajectory geometries, with AdamW developing multi-dimensional structures while SGD produces more linear evolution.

Key Takeaways
  • Parameter updates in transformer training organize into a dominant drift direction with residual transverse dynamics.
  • A single direction captures most cumulative parameter movement early in training using trajectory PCA analysis.
  • AdamW optimizer creates multi-dimensional drift structures while SGD variants produce nearly colinear parameter evolution.
  • Instantaneous gradients show little alignment with the dominant direction, indicating it emerges from accumulated optimizer updates.
  • Optimizer choice significantly shapes learning trajectory structure beyond what loss values alone reveal.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles