AIBullisharXiv โ CS AI ยท 4d ago7/103
๐ง
SageBwd: A Trainable Low-bit Attention
Researchers have developed SageBwd, a trainable INT8 attention mechanism that can match full-precision attention performance during pre-training while quantizing six of seven attention matrix multiplications. The study identifies key factors for stable training including QK-norm requirements and the impact of tokens per step on quantization errors.