AIBullisharXiv – CS AI · 18h ago7/10
🧠
STAR-KV: Low-Rank KV Cache Compression via Soft Thresholding for Adaptive Rank Control
Researchers introduce STAR-KV, an adaptive compression framework that reduces KV cache memory requirements in large language models by up to 75% through low-rank projections and intelligent rank selection. The technique achieves up to 20x compression when combined with quantization and delivers significant speedups in attention computation, addressing a critical bottleneck in LLM inference efficiency.