🧠 AI🟢 BullishImportance 7/10

SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models

arXiv – CS AI|Ziwei Li, Yuang Ma, Yi Kang|April 7, 2026 at 04:00 AM

🤖AI Summary

Researchers propose SLaB, a novel framework for compressing large language models by decomposing weight matrices into sparse, low-rank, and binary components. The method achieves significant improvements over existing compression techniques, reducing perplexity by up to 36% at 50% compression rates without requiring model retraining.

Key Takeaways

→SLaB decomposes each linear layer weight into three complementary components: sparse, low-rank, and binary matrices.
→The framework eliminates the need for retraining models during compression.
→Testing on Llama-family models shows up to 36% perplexity reduction compared to existing methods at 50% compression.
→The method improves accuracy by up to 8.98% over baseline on zero-shot tasks.
→SLaB uses activation-aware pruning scores to guide the decomposition process.

Mentioned in AI

Companies

Perplexity→

Models

LlamaMeta