Contribution Weights: A Geometrical Analysis of Self-Attention Transformers
Researchers introduce Contribution Weights, a new metric for analyzing transformer attention that accounts for value vector geometry alongside attention weights. The approach more accurately identifies semantically critical tokens than traditional attention-based metrics and reveals that attention sinks actively suppress information rather than passively storing excess attention.
This research addresses a fundamental limitation in how AI researchers interpret Large Language Model behavior. Attention weight analysis has dominated LLM interpretability work, but the new Contribution Weights metric demonstrates that analyzing attention in isolation misses crucial geometric properties of the neural representations being aggregated. By incorporating value magnitude and directional alignment with layer output, this projection-based approach provides a materially more accurate picture of information flow through transformer networks.
The work builds on growing recognition that attention mechanisms function more complexly than surface-level weight distributions suggest. Previous interpretability research has relied heavily on visualizing attention patterns, but these visualizations often obscure the actual computational importance of tokens. The introduction of a geometrically-informed metric represents an incremental but meaningful advance in mechanistic interpretability—the field attempting to reverse-engineer how neural networks process information.
The discovery that attention sinks serve an active functional role rather than passive buffering has immediate implications for model design and optimization. If sinks actively stabilize representations by suppressing semantic drift in low-confidence tokens, this suggests their presence is computationally purposeful. Understanding this mechanism could inform architectural improvements and more efficient model training strategies. The convex relationship between sink rate and output norm provides a quantifiable handle on this previously opaque phenomenon.
For the broader AI research community, this work validates the need for richer analytical frameworks beyond standard attention visualization. As LLMs become increasingly deployed in critical applications, more faithful interpretability metrics become essential for understanding failure modes and building trustworthy systems. The methodology could extend to other architectures beyond decoder-only models, potentially becoming standard practice in mechanistic interpretability research.
- →Contribution Weights metric outperforms traditional attention analysis by accounting for value vector geometry and directional alignment.
- →Attention sinks actively suppress information through convex relationships rather than passively storing excess attention.
- →Geometric properties of aggregated value vectors significantly impact token importance measurements.
- →The metric enables novel mechanistic insights into how transformers stabilize representations across different model architectures.
- →Richer interpretability frameworks are essential for understanding LLM information flow beyond surface-level attention patterns.