y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#optimal-brain-damage News & Analysis

1 article tagged with #optimal-brain-damage. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBullisharXiv – CS AI · 7h ago7/10
🧠

OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference

Researchers propose OBCache, a novel KV cache pruning framework that optimizes memory efficiency for long-context LLM inference by measuring token importance based on actual impact to attention outputs rather than heuristic attention weights. The method, grounded in Optimal Brain Damage theory, demonstrates consistent accuracy improvements over existing eviction strategies on LLaMA and Qwen models.