AIBullisharXiv โ CS AI ยท Feb 276/106
๐ง
Large Language Model Compression with Global Rank and Sparsity Optimization
Researchers propose a novel two-stage compression method for Large Language Models that uses global rank and sparsity optimization to significantly reduce model size. The approach combines low-rank and sparse matrix decomposition with probabilistic global allocation to automatically detect redundancy across different layers and manage component interactions.