AIBullisharXiv โ CS AI ยท 5h ago7/10
๐ง
SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression
Researchers propose SoLA, a training-free compression method for large language models that combines soft activation sparsity and low-rank decomposition. The method achieves significant compression while improving performance, demonstrating 30% compression on LLaMA-2-70B with reduced perplexity from 6.95 to 4.44 and 10% better downstream task accuracy.
๐ข Perplexity