AINeutralarXiv – CS AI · 6h ago6/10
🧠
Weight Decay Improves Language Model Plasticity
Researchers demonstrate that weight decay during language model pretraining significantly improves model plasticity—the ability to adapt to downstream tasks through fine-tuning. The study reveals counterintuitive findings where higher weight decay produces weaker base models but stronger performance after task-specific training, challenging conventional approaches to hyperparameter optimization.