From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons
Researchers introduce FLUID, a framework that adapts autoregressive language models to diffusion-based text generation by enforcing strictly causal attention patterns, eliminating the need for expensive retraining from scratch. The approach incorporates Elastic Horizons, a dynamic denoising mechanism that improves efficiency and achieves state-of-the-art performance while reducing training costs significantly.