AIBullisharXiv โ CS AI ยท 6h ago2
๐ง
Attn-QAT: 4-Bit Attention With Quantization-Aware Training
Researchers introduce Attn-QAT, the first systematic approach to 4-bit quantization-aware training for attention mechanisms in AI models. The method enables stable FP4 computation on emerging GPUs and delivers up to 1.5x speedup on RTX 5090 while maintaining model quality across diffusion and language models.