AIBullisharXiv – CS AI · 10h ago7/10
🧠
Kaczmarz Linear Attention
Researchers propose Kaczmarz Linear Attention (KLA), an improved algorithm for long-context language modeling that replaces empirically-learned coefficients with mathematically-derived key-norm-normalized step sizes. KLA outperforms existing linear attention baselines like Gated DeltaNet while maintaining computational efficiency and enabling stable processing of up to 65K token contexts.
🏢 Perplexity