y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#draft-models News & Analysis

4 articles tagged with #draft-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AIBullisharXiv – CS AI · May 127/10
🧠

PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding

PARD-2 introduces a dual-mode speculative decoding framework that accelerates large language model inference by up to 6.94× through improved draft model training aligned with token acceptance rather than prediction accuracy. The advancement uses Confidence-Adaptive Token optimization to enable single draft models to operate in both target-dependent and target-independent modes, significantly outperforming existing methods like EAGLE-3.

🧠 Llama
AIBullisharXiv – CS AI · Mar 167/10
🧠

When Drafts Evolve: Speculative Decoding Meets Online Learning

Researchers introduce OnlineSpec, a framework that uses online learning to continuously improve draft models in speculative decoding for large language model inference acceleration. The approach leverages verification feedback to evolve draft models dynamically, achieving up to 24% speedup improvements across seven benchmarks and three foundation models.

AIBullisharXiv – CS AI · Mar 117/10
🧠

Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation

Researchers introduce Efficient Draft Adaptation (EDA), a framework that significantly reduces the cost of adapting draft models for speculative decoding when target LLMs are fine-tuned. EDA achieves superior performance through decoupled architecture, data regeneration, and smart sample selection while requiring substantially less training resources than full retraining.

AIBullisharXiv – CS AI · Mar 37/104
🧠

Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding

Researchers introduce Group Tree Optimization (GTO), a new training method that improves speculative decoding for large language models by aligning draft model training with actual decoding policies. GTO achieves 7.4% better acceptance length and 7.7% additional speedup over existing state-of-the-art methods across multiple benchmarks and LLMs.