AIBullisharXiv – CS AI · 3h ago7/10
🧠
GoQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization
GoQuant introduces Orthogonal Residual Projection (ORP), a quantization framework that enables efficient deployment of large language models on edge devices by replacing multiplication operations with bit-shifts. The approach achieves competitive performance at 3-bit precision while reducing calibration time to 15 minutes, addressing fundamental geometric limitations in power-of-two quantization.
🏢 Perplexity