#diffusion-language-models News & Analysis

6 articles tagged with #diffusion-language-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles

AIBullisharXiv – CS AI · Jun 257/10

🧠

Streaming-dLLM: Accelerating Diffusion LLMs via Suffix Pruning and Dynamic Decoding

Researchers introduce Streaming-dLLM, a training-free optimization framework that accelerates Diffusion Language Models by up to 68.2X through spatial suffix pruning and dynamic temporal decoding strategies. The approach maintains generation quality while addressing inherent inefficiencies in block-wise diffusion processes, representing a significant advance in making parallel decoding models more computationally practical.

AIBullisharXiv – CS AI · Jun 97/10

🧠

Enabling KV Caching of Shared Prefix for Diffusion Language Models

Researchers introduce bicache, a novel KV caching technique that enables efficient serving of diffusion language models (DLMs) with shared prefixes. Unlike traditional LLMs, DLMs use bidirectional attention, which invalidates conventional caching methods and causes accuracy collapse. Bicache dynamically identifies safe layer depths for prefix reuse, achieving 36-98% throughput improvements.

AIBullisharXiv – CS AI · Jun 57/10

🧠

A Survey on Diffusion Language Models

A comprehensive survey examines Diffusion Language Models (DLMs), an emerging alternative to autoregressive language models that generate text through parallel iterative denoising. DLMs achieve significant inference speed improvements while maintaining comparable performance and enabling better bidirectional context understanding and generation control.

AINeutralarXiv – CS AI · Jun 196/10

🧠

Diffusion Language Models: An Experimental Analysis

Researchers present a systematic experimental analysis comparing eight state-of-the-art Diffusion Language Models (DLMs) across eight benchmarks to evaluate their performance and computational efficiency. The study reveals that DLMs, which generate text through iterative denoising rather than autoregressive next-token prediction, exhibit distinct trade-offs influenced heavily by inference-time design choices like denoising steps and parallel unmasking strategies.

AINeutralarXiv – CS AI · May 296/10

🧠

DLM-SWAI: Steering Diffusion Language Models Before They Unmask

Researchers propose DLM-SWAI, a training-free method for steering diffusion language models toward desired outputs by biasing token distributions during iterative denoising. The approach enables controllable text generation for style and safety applications without retraining or auxiliary models, addressing a gap in control methods for diffusion-based language generation.

AINeutralarXiv – CS AI · Apr 206/10

🧠

DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference

Researchers introduce DepCap, a training-free framework that optimizes diffusion language model (DLM) inference through adaptive block-wise parallel decoding. The method achieves up to 5.63× speedup by using cross-step signals to determine block boundaries and identifying conflict-free token subsets for safe parallel execution, maintaining quality while significantly accelerating inference.