AIBullisharXiv – CS AI · 18h ago7/10
🧠
Enabling KV Caching of Shared Prefix for Diffusion Language Models
Researchers introduce bicache, a novel KV caching technique that enables efficient serving of diffusion language models (DLMs) with shared prefixes. Unlike traditional LLMs, DLMs use bidirectional attention, which invalidates conventional caching methods and causes accuracy collapse. Bicache dynamically identifies safe layer depths for prefix reuse, achieving 36-98% throughput improvements.