AINeutralarXiv – CS AI · 5h ago6/10
🧠
Limitations of Normalization in Attention Mechanism
Researchers present a theoretical and empirical analysis of softmax normalization limitations in attention mechanisms, demonstrating that as token selection increases, models lose their ability to distinguish important tokens and converge toward uniform selection patterns. The findings highlight gradient sensitivity challenges during training and suggest that improved normalization strategies are needed for more effective attention architectures.