When Drafts Evolve: Speculative Decoding Meets Online Learning
Researchers introduce OnlineSpec, a framework that uses online learning to continuously improve draft models in speculative decoding for large language model inference acceleration. The approach leverages verification feedback to evolve draft models dynamically, achieving up to 24% speedup improvements across seven benchmarks and three foundation models.