🧠 AI🟢 BullishImportance 7/10

CoDAR: Continuous Diffusion Language Models are More Powerful Than You Think

arXiv – CS AI|Junzhe Shen, Jieru Zhao, Ziwei He, Zhouhan Lin|March 4, 2026 at 05:00 AM|4 views

🤖AI Summary

Researchers propose CoDAR, a new continuous diffusion language model framework that addresses key bottlenecks in token rounding through a two-stage approach combining continuous diffusion with an autoregressive decoder. The model demonstrates substantial improvements in generation quality over existing latent diffusion methods and becomes competitive with discrete diffusion language models.

Key Takeaways

→Token rounding from denoised embeddings to tokens was identified as the primary bottleneck limiting continuous diffusion language models.
→CoDAR uses a two-stage framework keeping diffusion continuous in embedding space while employing a context-conditional discretizer.
→The model incorporates an autoregressive Transformer decoder that cross-attends to denoised embedding sequences for contextualized token rounding.
→Experiments on LM1B and OpenWebText show CoDAR substantially outperforms latent diffusion and competes with strong discrete diffusion models.
→The framework provides a decoder-temperature control mechanism to balance the fluency-diversity trade-off in text generation.