AINeutralarXiv – CS AI · 7h ago6/10
🧠
SPEAR: A System for Post-Quantization Error-Adaptive Recovery Enabling Efficient Low-Bit LLM Serving
SPEAR is a new system that improves efficiency of quantized large language models by using adaptive error correction tailored to individual tokens, rather than static corrections applied uniformly. The technique recovers 56-75% of the performance gap between 4-bit and full-precision models while adding minimal memory overhead, advancing practical LLM deployment at scale.
🏢 Perplexity