y0news
AnalyticsDigestsSourcesRSSAICrypto
#cache-sharing1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 8h ago7/10
๐Ÿง 

ICaRus: Identical Cache Reuse for Efficient Multi Model Inference

ICaRus introduces a novel architecture enabling multiple AI models to share identical Key-Value (KV) caches, addressing memory explosion issues in multi-model inference systems. The solution achieves up to 11.1x lower latency and 3.8x higher throughput by allowing cross-model cache reuse while maintaining comparable accuracy to task-specific fine-tuned models.