🧠 AI🟢 BullishImportance 7/10

RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training

arXiv – CS AI|Sahil Joshi, Agniva Chowdhury, Amar Kanakamedala, Ekam Singh, Evan Tu, Anshumali Shrivastava|March 3, 2026 at 05:00 AM|3 views

🤖AI Summary

Researchers introduce RACE Attention, a new linear-time alternative to traditional Softmax Attention that can process up to 75 million tokens in a single pass, compared to current GPU-optimized implementations that fail beyond 4 million tokens. The technology uses angular similarity and Gaussian random projections to achieve dramatic efficiency gains while maintaining performance across language modeling and classification tasks.

Key Takeaways

→RACE Attention achieves strictly linear time complexity compared to quadratic complexity of traditional Softmax Attention.
→The system can process up to 12 million tokens on NVIDIA GH200 GPU and 75 million on Intel CPU, far exceeding current capabilities.
→Performance matches or outperforms existing baselines up to 64K sequence length while reducing memory usage and processing time.
→The technology replaces exponential kernels with sharpened angular similarity and uses Gaussian random projections to avoid full attention matrix construction.
→Implementation addresses a fundamental scalability bottleneck in current AI training infrastructure for long-context applications.