AIBullisharXiv – CS AI · 15h ago7/10
🧠
Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers
Researchers introduce a symmetry-compatible principle for neural network optimizer design that aligns gradient updates with the geometric properties of different parameter types. The approach yields specialized update rules for embeddings, language model heads, SwiGLU MLPs, and mixture-of-experts routers, demonstrating improved validation loss and training stability across multiple language model architectures compared to standard AdamW optimization.