AINeutralarXiv โ CS AI ยท 7h ago6/10
๐ง
DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference
Researchers introduce DepCap, a training-free framework that optimizes diffusion language model (DLM) inference through adaptive block-wise parallel decoding. The method achieves up to 5.63ร speedup by using cross-step signals to determine block boundaries and identifying conflict-free token subsets for safe parallel execution, maintaining quality while significantly accelerating inference.