y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#parallel-decoding News & Analysis

9 articles tagged with #parallel-decoding. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

9 articles
AIBullisharXiv – CS AI · 5d ago7/10
🧠

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Researchers introduce LocateAnything, a new vision-language model framework that uses Parallel Box Decoding to detect and localize objects simultaneously rather than sequentially, improving both inference speed and accuracy. The team curated a 138-million-sample dataset and demonstrated significant performance improvements across multiple benchmarks.

AIBullisharXiv – CS AI · May 117/10
🧠

Regulating Branch Parallelism in LLM Serving

Researchers introduce TAPER, an admission controller for managing parallel branch execution in LLM serving systems. The system dynamically regulates how many concurrent decoding branches are allowed per request step, balancing throughput gains against degradation to co-batched requests, achieving 1.77x improvement in goodput over conservative baselines.

AIBullisharXiv – CS AI · Apr 147/10
🧠

FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models

Researchers introduce FS-DFM, a discrete flow-matching model that generates long text 128x faster than standard diffusion models while maintaining quality parity. The breakthrough uses few-step sampling with teacher guidance distillation, achieving in 8 steps what previously required 1,024 evaluations.

🏢 Perplexity
AIBullisharXiv – CS AI · 18h ago6/10
🧠

Chatterbox-Flash: Prior-Calibrated Block Diffusion for Streaming Zero-Shot TTS

Researchers introduce Chatterbox-Flash, a zero-shot text-to-speech model combining block-diffusion decoding with streaming capabilities. The system addresses token distribution bias through prior-calibrated scoring and early-decoding schedules, achieving high-fidelity speech synthesis with low latency comparable to autoregressive systems.

AINeutralarXiv – CS AI · 18h ago6/10
🧠

Stop the Flip-Flop: Context-Preserving Verification for Fast Revocable Diffusion Decoding

Researchers introduce COVER, a new verification technique for diffusion language models that eliminates inefficient token oscillations during parallel decoding. By using KV cache overrides to preserve context while selectively verifying tokens in a single forward pass, COVER accelerates inference while maintaining output quality.

AINeutralarXiv – CS AI · Apr 206/10
🧠

DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference

Researchers introduce DepCap, a training-free framework that optimizes diffusion language model (DLM) inference through adaptive block-wise parallel decoding. The method achieves up to 5.63× speedup by using cross-step signals to determine block boundaries and identifying conflict-free token subsets for safe parallel execution, maintaining quality while significantly accelerating inference.

AINeutralarXiv – CS AI · Apr 146/10
🧠

Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow

Researchers evaluated eight large Masked Diffusion Language Models (up to 100B parameters) and found they still underperform comparable autoregressive models despite promises of parallel token generation. The study reveals MDLMs exhibit task-dependent decoding behavior and propose a Generate-then-Edit paradigm to improve performance while maintaining parallel processing efficiency.

AIBullisharXiv – CS AI · Mar 36/106
🧠

MetaState: Persistent Working Memory for Discrete Diffusion Language Models

Researchers introduce MetaState, a recurrent augmentation for discrete diffusion language models (dLLMs) that adds persistent working memory to improve text generation quality. The system addresses the 'Information Island' problem where intermediate representations are discarded between denoising steps, achieving improved accuracy on LLaDA-8B and Dream-7B models with minimal parameter overhead.

AIBullisharXiv – CS AI · Mar 36/104
🧠

AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size

Researchers introduce AdaBlock-dLLM, a training-free optimization technique for diffusion-based large language models that adaptively adjusts block sizes during inference based on semantic structure. The method addresses limitations in conventional fixed-block semi-autoregressive decoding, achieving up to 5.3% accuracy improvements under the same throughput budget.