AIBullisharXiv โ CS AI ยท 5h ago7/10
๐ง
SpecBranch: Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism
SpecBranch introduces a novel speculative decoding framework that leverages branch parallelism to accelerate large language model inference, achieving 1.8x to 4.5x speedups over standard auto-regressive decoding. The technique addresses serialization bottlenecks in existing speculative decoding methods by implementing parallel drafting branches with adaptive token lengths and rollback-aware orchestration.