TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM
Researchers introduce TAD, a temporal-aware self-distillation framework that improves diffusion large language models' accuracy-parallelism trade-off by using adaptive loss functions based on token decoding timelines. The method increases accuracy from 46.2% to 51.6% while enabling aggressive acceleration modes, addressing a fundamental limitation in parallel text generation.