AINeutralarXiv – CS AI · 4h ago6/10
🧠
Training for the Model You Return: Improving Optimization for Iterate-Averaged Language Models
Researchers present PACE, a novel optimizer training method that improves language model performance by optimizing for iterate-averaged weights rather than final training weights. By formulating the problem as an optimal-control challenge and wrapping AdamW with a clipped pulling mechanism toward exponential moving averages, PACE demonstrates theoretical convergence improvements and empirical gains across 1-2B parameter models and GPT-2 pretraining.