π€AI Summary
Researchers have discovered that transformer models, despite different training runs producing different weights, converge to the same compact 'algorithmic cores' - low-dimensional subspaces essential for task performance. The study shows these invariant structures persist across different scales and training runs, suggesting transformer computations are organized around shared algorithmic patterns rather than implementation-specific details.
Key Takeaways
- βIndependently trained transformers learn different weights but converge to identical algorithmic cores necessary for task performance.
- βMarkov-chain transformers embed 3D cores in orthogonal subspaces yet recover identical transition spectra.
- βModular-addition transformers discover compact cyclic operators during grokking that later expand during memorization-to-generalization transition.
- βGPT-2 models control subject-verb agreement through a single axis that can invert grammatical number when flipped.
- βLow-dimensional invariants persist across training runs and scales, suggesting shared computational structures in transformers.
#transformers#algorithmic-cores#mechanistic-interpretability#language-models#gpt-2#ai-research#neural-networks#computational-structures
Read Original βvia arXiv β CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains β you keep full control of your keys.
Related Articles