AINeutralarXiv – CS AI · 5h ago6/10
🧠
Critical Windows of Complexity Control: When Transformers Decide to Reason or Memorize
Researchers identify a critical training window where Transformer models decide between memorization and reasoning, finding that applying weight decay during a specific 25% training phase matches full-training performance on compositional tasks. The discovery reveals sharp boundaries in this decision point, with timing shifts of just 100 optimization steps causing dramatic accuracy swings from chance performance to robust reasoning.