Diffusion-Inspired Masked Fine-Tuning for Knowledge Injection in Autoregressive LLMs
Researchers demonstrate that masked fine-tuning—a demasking objective borrowed from diffusion models—significantly improves knowledge injection in autoregressive LLMs without requiring expensive paraphrase augmentation and while remaining resistant to the reversal curse. This technique closes the performance gap between autoregressive and diffusion language models, with applications extending to math tasks and large-scale knowledge-intensive benchmarks.