y0news
← Feed
Back to feed
🧠 AI🟢 BullishImportance 7/10

RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training

arXiv – CS AI|Sahil Joshi, Agniva Chowdhury, Amar Kanakamedala, Ekam Singh, Evan Tu, Anshumali Shrivastava||3 views
🤖AI Summary

Researchers introduce RACE Attention, a new linear-time alternative to traditional Softmax Attention that can process up to 75 million tokens in a single pass, compared to current GPU-optimized implementations that fail beyond 4 million tokens. The technology uses angular similarity and Gaussian random projections to achieve dramatic efficiency gains while maintaining performance across language modeling and classification tasks.

Key Takeaways
  • RACE Attention achieves strictly linear time complexity compared to quadratic complexity of traditional Softmax Attention.
  • The system can process up to 12 million tokens on NVIDIA GH200 GPU and 75 million on Intel CPU, far exceeding current capabilities.
  • Performance matches or outperforms existing baselines up to 64K sequence length while reducing memory usage and processing time.
  • The technology replaces exponential kernels with sharpened angular similarity and uses Gaussian random projections to avoid full attention matrix construction.
  • Implementation addresses a fundamental scalability bottleneck in current AI training infrastructure for long-context applications.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles