AIBullisharXiv – CS AI · Jun 107/10
🧠Whisfusion introduces a masked diffusion decoder that achieves faster speech-to-text processing than Whisper-large-v3 while matching or exceeding its accuracy across multilingual benchmarks. By replacing autoregressive decoding with parallel diffusion decoding, the system runs 4-5x faster while maintaining competitive performance with leading ASR systems, establishing non-autoregressive diffusion as a viable paradigm for high-throughput transcription.
AIBearisharXiv – CS AI · Jun 97/10
🧠Researchers demonstrate that generative perplexity (gen-PPL), the primary metric for evaluating non-autoregressive language models, is fundamentally flawed because it measures only predictability under frozen scorers, not actual text quality. They construct deliberately naive samplers that achieve state-of-the-art results while producing incoherent text, proving the metric's inadequacy and advocating for distributional divergence metrics instead.
🏢 Perplexity
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers introduce DiffuSent, a non-autoregressive diffusion framework that reformulates seven aspect-based sentiment analysis (ABSA) subtasks as boundary denoising processes. The approach achieves significant improvements over existing generative models, particularly on multi-word expressions, while delivering up to 181x faster inference speeds through parallel decoding rather than sequential token generation.
AIBullisharXiv – CS AI · May 296/10
🧠Researchers introduce NaRA (Noise-aware Low-Rank Adaptation), a parameter-efficient fine-tuning method designed specifically for diffusion large language models that adapts to noise levels during the denoising process. Unlike existing methods like LoRA that use static parameters, NaRA employs a hypernetwork to dynamically adjust low-rank matrices based on noise, achieving better performance on reasoning and code generation tasks.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers propose Cola DLM, a hierarchical latent diffusion language model that generates text through continuous semantic modeling rather than traditional left-to-right autoregressive decoding. The approach achieves comparable performance to autoregressive models while offering greater flexibility, better scaling properties, and a potential pathway for unified modeling across discrete and continuous modalities.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers identify a critical failure mode in non-autoregressive diffusion language models caused by proximity bias, where the denoising process concentrates on adjacent tokens, creating spatial error propagation. They propose a minimal-intervention approach using a lightweight planner and temperature annealing to guide early token selection, achieving substantial improvements on reasoning and planning tasks.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers introduce SyncSpeech, a new text-to-speech model that combines autoregressive and non-autoregressive approaches using a Temporal Mask Transformer architecture. The model achieves 5.8x lower first-packet latency and 8.8x improved real-time performance while maintaining comparable speech quality to existing models.
AINeutralarXiv – CS AI · Feb 276/1011
🧠Researchers identify why Diffusion Language Models (DLMs) struggle with parallel token generation, finding that training data structure forces autoregressive-like behavior. They propose NAP, a data-centric approach using multiple independent reasoning trajectories that improves parallel decoding performance on math benchmarks.