#asentmax Articles

#asentmax1 article

1 articles

AIBullisharXiv – CS AI · 5d ago7/105

🧠

Long-Context Generalization with Sparse Attention

Researchers introduce ASEntmax, a new attention mechanism for transformer models that uses sparse attention with learnable temperature parameters. This approach significantly outperforms traditional softmax attention, achieving up to 1000x length extrapolation on synthetic tasks and better long-context performance in language modeling.