AIBullisharXiv – CS AI · 6h ago6/10
🧠
Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization
Researchers introduce Pro-KLShampoo, an improved optimizer for LLM pre-training that combines Kronecker-factored preconditioning with gradient orthogonalization. By exploiting the observed spike-and-flat eigenvalue structure in KL-Shampoo's preconditioners, Pro-KLShampoo achieves better validation loss, reduced memory usage, and faster training across multiple model scales.