AINeutralarXiv – CS AI · 9h ago6/10
🧠
When Good Enough Is Optimal: Multiplication-Only Matrix Inversion Approximation for Quantized Gated DeltaNet
Researchers propose a fast matrix multiplication-based algorithm for matrix inversion in linear attention mechanisms, achieving up to 5x speedup on neural processing units while maintaining model accuracy under both standard and low-precision inference. The method addresses a critical computational bottleneck in long-context language modeling by using truncated Neumann expansion and parallel residual correction.