AIBullisharXiv – CS AI · 5h ago7/10
🧠
FASQ: Flexible Accelerated Subspace Quantization for Calibration-Free LLM Compression
Researchers introduce FASQ, a calibration-free compression framework for large language models that uses product quantization to achieve flexible compression ratios between 27-49% of original model size. The method outperforms existing quantization approaches like GPTQ and AWQ while enabling faster inference than FP16 on consumer GPUs through custom CUDA kernels.
🧠 Llama