AIBearisharXiv โ CS AI ยท 7h ago7/10
๐ง
The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference
Researchers have discovered that FP16 floating-point precision causes systematic numerical divergence between KV-cached and cache-free inference in transformer models, producing 100% token divergence across multiple architectures. This challenges the long-held assumption that KV caching is numerically equivalent to standard computation, with controlled FP32 experiments confirming FP16 non-associativity as the causal mechanism.