AIBullisharXiv – CS AI · 10h ago7/10
🧠
PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding
PARD-2 introduces a dual-mode speculative decoding framework that accelerates large language model inference by up to 6.94× through improved draft model training aligned with token acceptance rather than prediction accuracy. The advancement uses Confidence-Adaptive Token optimization to enable single draft models to operate in both target-dependent and target-independent modes, significantly outperforming existing methods like EAGLE-3.
🧠 Llama