AIBullisharXiv – CS AI · 15h ago7/10
🧠
InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization
Researchers introduce InfoQuant, a training-free method that optimizes activation distributions for low-bit quantization in large language models by using Peak Suppression Orthogonal Transformation. The technique achieves 97% accuracy preservation under W4A4KV4 quantization and reduces performance degradation by 42% compared to previous methods, advancing efficient LLM deployment.