y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#cache-optimization News & Analysis

2 articles tagged with #cache-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBullisharXiv – CS AI · 7h ago7/10
🧠

Leyline: KV Cache Directives for Agentic Inference

Leyline introduces a new serving-side primitive for managing KV cache in agentic LLMs, enabling efficient content editing and removal without full re-computation. The system uses declarative directives and RoPE-rotation corrections to handle policy-driven cache modifications, improving cache efficiency by 11.2 percentage points and agent solve rates by 14.3 percentage points.

AINeutralarXiv – CS AI · 7h ago6/10
🧠

Hybrid Verified Decoding: Learning to Allocate Verification in Speculative Decoding

Researchers propose Hybrid Verified Decoding, a technique that improves LLM inference speed by intelligently choosing between cache-based and model-based token drafting methods. The approach predicts draft acceptance rates before verification, achieving 2.73x average speedup on agentic workflows and outperforming existing methods like EAGLE3.