AINeutralarXiv – CS AI · 9h ago6/10
🧠
Where does Absolute Position come from in decoder-only Transformers?
Researchers discovered that RoPE-trained transformer models encode absolute position information despite RoPE only encoding relative offsets, with the leakage originating from causal masking and residual stream components. The findings reveal how different architectural variants—NTK scaling, sliding-window attention, and standard RoPE—balance these position-encoding mechanisms differently, with attention sinks serving as token-anchored stabilizers.