AINeutralarXiv โ CS AI ยท 10h ago7/10
๐ง
The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure
Researchers studied multi-task grokking in Transformers, revealing five key phenomena including staggered generalization order and weight decay phase structures. The study shows how AI models construct compact superposition subspaces in parameter space, with weight decay acting as compression pressure.