AIBullisharXiv โ CS AI ยท 8h ago7/10
๐ง
GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs
Researchers propose GlowQ, a new quantization technique for large language models that reduces memory overhead and latency while maintaining accuracy. The method uses group-shared low-rank approximation to optimize deployment of quantized LLMs, showing significant performance improvements over existing approaches.
๐ข Perplexity