AINeutralarXiv – CS AI · 10h ago6/10
🧠
Sink vs. diagonal patterns as mechanisms for attention switch and oversmoothing prevention
Researchers analyze how attention mechanisms in transformers use sinks (special tokens) and diagonal patterns to prevent oversmoothing and enable efficient computation. The study establishes mathematical conditions for when sinks outperform alternatives and proves equivalence between sinks and hard attention switches, providing theoretical foundation for design choices in pretrained transformers.