AIBullisharXiv – CS AI · 7h ago6/10
🧠
FLARE: Diffusion for Hybrid Language Model
Researchers introduce FLARE, a conversion framework that enables large language models with hybrid attention mechanisms to function as both autoregressive and diffusion models, addressing a key limitation in parallel decoding while maintaining model capability. The approach demonstrates competitive performance with existing diffusion language models while delivering throughput gains in concurrent serving scenarios.