AIBullisharXiv – CS AI · 14h ago6/10
🧠
BlockBatch: Multi-Scale Consensus Decoding for Efficient Diffusion Language Model Inference
BlockBatch introduces a training-free inference framework that optimizes diffusion language models by executing multiple block-size branches simultaneously, achieving 26.6% reduction in computational steps and 1.33x speedup over existing methods. The approach exploits the complementary nature of different decoding granularities to balance parallelism with accuracy while managing the inherent trade-offs in block-wise inference.