Where to Place the Query? Unveiling and Mitigating Positional Bias in In-Context Learning for Diffusion LLMs via Decoding Dynamics
Researchers demonstrate that query placement significantly impacts performance in Diffusion Large Language Models (dLLMs) during in-context learning, contrary to conventional practices inherited from autoregressive models. The study reveals a spatial recency effect in attention mechanisms and proposes Auto-ICL, a training-free strategy that dynamically optimizes query positioning to approach oracle performance across diverse tasks.
This research addresses a fundamental architectural difference between diffusion-based and autoregressive language models that has been overlooked in practical applications. While autoregressive models enforce unidirectional causal masking that constrains query placement, diffusion models leverage bidirectional attention, enabling flexible spatial positioning. The study's core finding—that query position rivals semantic quality in importance—suggests the field has been leaving substantial performance gains on the table by mechanically applying AR-derived templates to fundamentally different architectures.
The root cause identified through decoding dynamics analysis reveals a spatial recency effect where attention patterns shift based on query location, with downstream effects on generation trajectories. This positions query placement as a first-order design variable rather than a minor implementation detail. The proposed Average Confidence metric represents a methodological contribution addressing the inadequacy of traditional confidence scoring in iterative decoding processes, capturing the cumulative information flow across multiple inference steps.
For the AI development community, this work has meaningful implications for model optimization without requiring retraining or labeled data. Auto-ICL's training-free approach democratizes access to performance improvements, making it broadly applicable across existing dLLM deployments. The framework's robustness across heterogeneous tasks—reasoning and perception—suggests architectural principles that could generalize beyond current model families. Practitioners implementing or evaluating diffusion language models should reconsider their prompt engineering conventions, as naive query placement likely degrades performance by measurable margins. The research establishes foundational baselines for spatial in-context learning that future work can build upon, potentially reshaping best practices in prompt design for bidirectional attention models.
- →Query position is a first-order performance variable in diffusion LLMs, with impact comparable to example semantic quality.
- →Spatial recency effects in attention mechanisms cause positional sensitivity that varies across different task types.
- →Traditional single-step confidence metrics fail to capture decoding dynamics in diffusion models; Average Confidence provides better calibration.
- →Auto-ICL offers a training-free solution that dynamically optimizes query placement and approaches oracle performance without ground-truth labels.
- →Current practices inappropriately transfer autoregressive prompt templates to diffusion models despite fundamental architectural differences in attention mechanisms.