AIBullisharXiv โ CS AI ยท 4h ago7/10
๐ง
SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models
Researchers propose SLaB, a novel framework for compressing large language models by decomposing weight matrices into sparse, low-rank, and binary components. The method achieves significant improvements over existing compression techniques, reducing perplexity by up to 36% at 50% compression rates without requiring model retraining.
๐ข Perplexity๐ง Llama