AINeutralarXiv โ CS AI ยท 6d ago7/104
๐ง
Residual Connections and the Causal Shift: Uncovering a Structural Misalignment in Transformers
Researchers identified a structural misalignment in Transformer models where residual connections tie to current tokens while supervision targets next tokens. They propose lightweight residual attenuation techniques that improve autoregressive Transformer performance by addressing this input-output alignment shift.