y0news
AnalyticsDigestsSourcesRSSAICrypto
#token-retention1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 6d ago7/103
๐Ÿง 

Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs

Researchers propose TRIM-KV, a novel approach that learns token importance for memory-bounded LLM inference through lightweight retention gates, addressing the quadratic cost of self-attention and growing key-value cache issues. The method outperforms existing eviction baselines across multiple benchmarks and provides insights into LLM interpretability through learned retention scores.