AIBullisharXiv – CS AI · 5h ago7/10
🧠
RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction
RetentiveKV introduces an entropy-driven optimization method for multimodal large language models that achieves 5x KV cache compression and 1.5x decoding acceleration by reformulating token eviction as continuous memory evolution rather than discrete pruning. The approach addresses limitations of existing compression methods by accounting for visual tokens that gain importance later in decoding and preserving spatial continuity of visual information.