What Gets Unmasked First? Trajectory Analysis of Diffusion Models for Graph-to-Text Generation
Researchers present the first systematic study of masked diffusion language models (MDLMs) for graph-to-text generation, revealing that these models naturally prioritize entities before relational words and structural tokens. The study identifies a failure mode in supervised fine-tuning that prematurely anchors structural tokens, and proposes lambda-scaled structural decoding to recover performance gains while introducing Graph-LLaDA for improved generalization across datasets.
This research addresses a fundamental gap in understanding how masked diffusion language models generate text from structured data, specifically graphs. Unlike traditional autoregressive language models that generate text sequentially from left to right, MDLMs iteratively unmask tokens in a learned order, creating distinctly different generation strategies. The study's discovery that MDLMs naturally prioritize semantic content (entities) before structural elements reveals an inherent bias toward meaningful information generation that differs significantly from human language production patterns.
The identification of supervised fine-tuning disrupting optimal generation trajectories presents a critical technical insight for model developers. By prematurely fixing structural tokens, SFT inadvertently constrains output length and information capacity, leading to hallucinations or omissions. This finding challenges conventional training approaches and suggests that fine-tuning methods require recalibration for diffusion-based architectures. The proposed lambda-scaled structural decoding offers a practical, inference-time solution without requiring retraining, achieving substantial BLEU score improvements.
The introduction of Graph-LLaDA explicitly addresses a key limitation in graph-to-text tasks by incorporating graph structure into the decoding process through a Graph Transformer encoder. The cross-dataset evaluation revealing that previous baselines overfit to specific patterns while LLM and MDLM approaches generalize better indicates a paradigm shift toward more robust generation methods. This work has implications for knowledge graph summarization, semantic parsing, and other structured data-to-text applications where maintaining both semantic accuracy and structural validity matters.
- βMDLMs generate text by prioritizing entities and relational words before structural tokens, contrasting sharply with linear autoregressive generation patterns.
- βSupervised fine-tuning can disrupt optimal MDLM generation trajectories by prematurely anchoring structural tokens, reducing information capacity.
- βLambda-scaled structural decoding recovers +9.4 BLEU-4 performance without retraining by downweighting structural token confidence at inference time.
- βGraph-LLaDA integrates explicit graph structure into diffusion decoding, improving generalization across diverse datasets compared to dataset-specific baselines.
- βMDLMs and LLM-based approaches demonstrate superior cross-dataset generalization compared to traditional graph-to-text models that overfit to specific patterns.