AINeutralarXiv – CS AI · Apr 206/10
🧠
DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference
Researchers introduce DepCap, a training-free framework that optimizes diffusion language model (DLM) inference through adaptive block-wise parallel decoding. The method achieves up to 5.63× speedup by using cross-step signals to determine block boundaries and identifying conflict-free token subsets for safe parallel execution, maintaining quality while significantly accelerating inference.