🧠 AI🟢 BullishImportance 6/10

Data-Aware Random Feature Kernel for Transformers

arXiv – CS AI|Amirhossein Farzam, Hossein Mobahi, Nolan Andrew Miller, Luke Sernau|March 5, 2026 at 05:00 AM

🤖AI Summary

Researchers introduce DARKFormer, a new transformer architecture that reduces computational complexity from quadratic to linear while maintaining performance. The model uses data-aware random feature kernels to address efficiency issues in pretrained transformer models with anisotropic query-key distributions.

Key Takeaways

→DARKFormer reduces transformer attention complexity from quadratic to linear in sequence length.
→The model addresses high Monte Carlo variance issues in existing random-feature attention mechanisms.
→Data-aligned kernels provide better training stability and performance compared to isotropic sampling.
→The approach is particularly effective in finetuning scenarios with pretrained anisotropic representations.
→DARKFormer narrows the performance gap with exact softmax attention while maintaining efficiency gains.