AIBullisharXiv – CS AI · 6h ago7/10
🧠
SENSE: Semantic Embedding Navigation with Soft-gated Evaluation for Retrieval-based Speculative Decoding
SENSE is a new retrieval-based speculative decoding method that accelerates LLM inference by using semantic embeddings instead of lexical matching to retrieve candidate tokens. The approach achieves up to 3.26x speedup while maintaining generation quality, outperforming existing methods on LLaMA and Qwen models.