Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow
Researchers evaluated eight large Masked Diffusion Language Models (up to 100B parameters) and found they still underperform comparable autoregressive models despite promises of parallel token generation. The study reveals MDLMs exhibit task-dependent decoding behavior and propose a Generate-then-Edit paradigm to improve performance while maintaining parallel processing efficiency.

