AIBullisharXiv – CS AI · 6h ago7/10
🧠
Sparse Prefix Caching for Hybrid and Recurrent LLM Serving
Researchers propose sparse prefix caching, a novel optimization technique for hybrid and recurrent LLM serving that stores exact states at checkpoint positions rather than caching entire token histories. The method uses dynamic programming to determine optimal checkpoint placement and demonstrates superior performance on real-world datasets while using fewer checkpoints than existing dense caching approaches.