#masked-diffusion News & Analysis

3 articles tagged with #masked-diffusion. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles

AINeutralarXiv – CS AI · Jun 256/10

🧠

Improved Large Language Diffusion Models

Researchers introduce iLLaDA, an 8B masked diffusion language model trained with fully bidirectional attention instead of the standard autoregressive approach. The model demonstrates significant performance improvements over its predecessor LLaDA and remains competitive with larger models like Qwen2.5 7B, suggesting bidirectional diffusion training is a viable alternative path for building competitive language models.

AIBullisharXiv – CS AI · Jun 106/10

🧠

Attention-Discounted Adaptive Sampler for Masked Diffusion Language Models

Researchers propose ADAS, a training-free reranking algorithm that improves parallel token decoding in masked diffusion language models by using attention weights as soft penalties to avoid committing to correlated predictions simultaneously. The method achieves 9-10 percentage point improvements on benchmarks like GSM8K and HumanEval with minimal computational overhead, advancing the efficiency of faster language model inference.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Tuning the Implicit Regularizer of Masked Diffusion Language Models: Enhancing Generalization via Insights from $k$-Parity

Researchers demonstrate that Masked Diffusion Language Models fundamentally alter neural network learning dynamics on the k-parity problem, eliminating the typical grokking phenomenon and enabling faster generalization. By decomposing the MD objective into signal and noise regimes, they optimize mask probability distribution, achieving up to 8.8% performance improvements on 50M-parameter models and 5.8% gains on 8B-parameter models.

🏢 Perplexity