#bidirectional-attention News & Analysis

2 articles tagged with #bidirectional-attention. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBullisharXiv – CS AI · Jun 97/10

🧠

Enabling KV Caching of Shared Prefix for Diffusion Language Models

Researchers introduce bicache, a novel KV caching technique that enables efficient serving of diffusion language models (DLMs) with shared prefixes. Unlike traditional LLMs, DLMs use bidirectional attention, which invalidates conventional caching methods and causes accuracy collapse. Bicache dynamically identifies safe layer depths for prefix reuse, achieving 36-98% throughput improvements.

AINeutralarXiv – CS AI · Jun 256/10

🧠

Improved Large Language Diffusion Models

Researchers introduce iLLaDA, an 8B masked diffusion language model trained with fully bidirectional attention instead of the standard autoregressive approach. The model demonstrates significant performance improvements over its predecessor LLaDA and remains competitive with larger models like Qwen2.5 7B, suggesting bidirectional diffusion training is a viable alternative path for building competitive language models.