🧠 AI⚪ NeutralImportance 6/10

When Do Attention Circuits Form? Developmental Trajectories of Capability and Attention-Sink Emergence Across Three 1B-ClassArchitectures

arXiv – CS AI|Yongzhong Xu|June 2, 2026 at 04:00 AM

🤖AI Summary

Researchers tracked how attention-head circuits form during training across three 1B-parameter language models, revealing that induction circuits and attention-sink circuits emerge as separate phenomena separated by an order of magnitude in training tokens. The study identifies architectural properties (zero BOS-heads in early layers) and demonstrates that circuit identification requires only 0.3-2% of total training data, offering insights into mechanistic interpretability of transformer models.

Analysis

This mechanistic interpretability study advances understanding of how transformer circuits develop during pretraining, moving beyond treating capability emergence as a single unified phase transition. The researchers' systematic tracking across multiple architectures and datasets reveals that attention-sink formation and capability-circuit formation follow distinct developmental timelines, with induction heads emerging substantially before BOS-attractor heads stabilize in DCLM-trained models. This separation challenges simplified phase-transition narratives and suggests language model development involves multiple, asynchronous capability acquisitions.

The work builds on established mechanistic interpretability frameworks but extends them to developmental questions: when and how do specific circuit types crystallize during training? The discovery that certain architectural properties—like the L0/L1 zero-BOS floor—represent hard constraints rather than learned behaviors has implications for model design. The finding that circuits can be identified using only early-training checkpoints significantly reduces computational costs for circuit discovery research, democratizing mechanistic interpretability studies.

For AI developers and researchers, these results provide actionable insights for model architecture design and training optimization. Understanding that circuits emerge at different phases allows for targeted interventions during pretraining to potentially influence capability development. The reproducibility across different architectures and training corpora suggests these patterns represent fundamental properties of transformer learning dynamics rather than dataset-specific artifacts. Future work might exploit this timeline separation to understand causal relationships between circuits and to develop more interpretable training procedures.

Key Takeaways

→Induction and attention-sink circuits form as separate transitions separated by 10-20x in training tokens, not as a single phase transition
→Architectural properties like zero-BOS heads in early layers are structural constraints, not learned features
→Circuit identification stabilizes within just 0.3-2% of total training tokens, enabling efficient mechanistic analysis
→BOS-attractor emergence follows different shapes across models: gradual ramps in Pythia/OLMoE but sharp phase transition in OLMo
→Elevated participation-ratio spectral signals predict induction head formation before capability thresholds are crossed

#mechanistic-interpretability #transformer-circuits #language-models #attention-heads #model-development #training-dynamics #circuit-formation #interpretability

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

When Do Attention Circuits Form? Developmental Trajectories of Capability and Attention-Sink Emergence Across Three 1B-ClassArchitectures

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge