AIBullisharXiv – CS AI · 3h ago7/10
🧠
Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression
Researchers propose Hurwitz Quaternion Multiplicative Quantization (HQMQ), a calibration-free method for compressing KV caches in large language models using quaternion mathematics. The technique achieves 5x compression with minimal perplexity loss, matching full-precision performance at ~5 bits while outperforming existing quantization methods across five major model architectures.
🧠 Llama