←Back to feed
🧠 AI⚪ NeutralImportance 7/10
The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure
🤖AI Summary
Researchers studied multi-task grokking in Transformers, revealing five key phenomena including staggered generalization order and weight decay phase structures. The study shows how AI models construct compact superposition subspaces in parameter space, with weight decay acting as compression pressure.
Key Takeaways
- →Multi-task grokking follows a consistent order: multiplication generalizes first, then squaring, then addition across different model seeds.
- →Optimization trajectories remain confined to low-dimensional execution manifolds, with orthogonal defects predicting generalization.
- →Weight decay creates distinct dynamical regimes that systematically affect grokking timescale and model performance.
- →Final solutions occupy only 4-8 principal directions but are distributed across full-rank weights and fragile to perturbations.
- →Removing less than 10% of orthogonal gradient components eliminates grokking, though dual-task models show partial recovery under extreme deletion.
#grokking#transformers#multi-task-learning#weight-decay#generalization#neural-networks#geometric-analysis#ai-research
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Related Articles