RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training
Researchers introduce RACE Attention, a new linear-time alternative to traditional Softmax Attention that can process up to 75 million tokens in a single pass, compared to current GPU-optimized implementations that fail beyond 4 million tokens. The technology uses angular similarity and Gaussian random projections to achieve dramatic efficiency gains while maintaining performance across language modeling and classification tasks.