y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#bidirectional-attention News & Analysis

1 article tagged with #bidirectional-attention. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBullisharXiv – CS AI · 18h ago7/10
🧠

Enabling KV Caching of Shared Prefix for Diffusion Language Models

Researchers introduce bicache, a novel KV caching technique that enables efficient serving of diffusion language models (DLMs) with shared prefixes. Unlike traditional LLMs, DLMs use bidirectional attention, which invalidates conventional caching methods and causes accuracy collapse. Bicache dynamically identifies safe layer depths for prefix reuse, achieving 36-98% throughput improvements.