AINeutralarXiv – CS AI · 5h ago6/10
🧠
P-Cast Precision in FP8 Attention: Sink-Induced Collapse and the Optimality of S=2^8
Researchers analyze precision loss in FP8 (8-bit floating-point) attention computations, identifying how the Attention Sink phenomenon causes numerical underflow when probability matrices are cast to FP8. The study validates engineering choices in FlashAttention-3/4, proving that reverse KV iteration combined with a scaling factor of S=256 eliminates precision collapse and provides a closed-form threshold for predicting kernel-level accuracy loss.