y0news
AnalyticsDigestsSourcesRSSAICrypto
#asentmax1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 5d ago7/105
๐Ÿง 

Long-Context Generalization with Sparse Attention

Researchers introduce ASEntmax, a new attention mechanism for transformer models that uses sparse attention with learnable temperature parameters. This approach significantly outperforms traditional softmax attention, achieving up to 1000x length extrapolation on synthetic tasks and better long-context performance in language modeling.