AIBullisharXiv – CS AI · 18h ago7/10
🧠
Chiaroscuro Attention: Spending Compute in the Dark
Researchers introduce CHIAR-Former, a hybrid transformer that routes tokens to different operators (DCT spectral mixing, RBF kernel mixing, or full self-attention) based on spectral entropy. The DCT+Attention variant achieves 45% better perplexity than standard attention on WikiText-103 while using 62.5% fewer attention operations, demonstrating significant computational efficiency gains for large-scale language models.