Causal Dimensionality of Transformer Representations: Measurement, Scaling, and Layer Structure
Researchers introduce causal dimensionality (kappa), a measurable property quantifying how transformer layers causally influence model outputs, finding that representational capacity grows 15.6x faster than causal capacity across scaling conditions. The metric remains invariant to model size increases, suggesting causal influence is a fundamental architectural property independent of parameter count.
This research addresses a foundational question in deep learning interpretability: what is the actual causal dimensionality of transformer representations? The authors develop a rigorous framework combining sparse autoencoders (SAEs) with attribution patching to measure how many independent features genuinely influence model outputs. Their key finding—the representational-causal wedge—reveals that while SAEs can extract increasingly rich feature dictionaries as width expands, the causal impact plateaus much earlier, saturating around 1,990 dimensions regardless of model scale.
The invariance of causal dimensionality across different model sizes (Gemma-2-2B and Gemma-2-9B) is particularly significant. This challenges assumptions that scaling parameters linearly increases causal capacity and suggests instead that transformer architectures have intrinsic dimensionality constraints. The constant kappa across network depths, paired with a 20x drop in attribution thresholds, indicates information is compressed and refined through layers rather than expanded.
For the AI research community, this work provides practical methodology for understanding what transformers actually compute versus what they represent. SAE practitioners gain insight into optimal feature dictionary sizes relative to causal relevance. The synthetic ground-truth controls validate the measurement approach, building confidence in the framework. The consistency across architectural variations suggests kappa captures something fundamental about transformer computation rather than implementation details.
These findings have implications for efficiency and interpretability research. If causal dimensionality is truly architecture-invariant and scales sub-linearly, future model designs might exploit this property for improved parameter efficiency without sacrificing expressiveness.
- →Causal dimensionality (kappa) measures effective rank of Jacobian outer products, revealing transformers causally depend on ~1,990 dimensions despite much larger representational capacity
- →The representational-causal wedge shows a 15.6x gap between representational growth and 4.35x causal growth across SAE widths, indicating redundancy in learned representations
- →Causal dimensionality is invariant to model scaling, with identical measures across 2.7B and 9B parameter models, suggesting architecture-intrinsic constraints rather than size-dependent properties
- →Attribution thresholds drop 20x across network depth while causal dimensionality remains constant, indicating information refinement rather than expansion through layers
- →Five validation controls including synthetic ground-truth recovery and architectural variants confirm kappa measures genuine causal influence independent of measurement artifacts