AINeutralarXiv – CS AI · 8h ago6/10
🧠
When Does Value-Aware KV Eviction Help? A Fixed-Contract Diagnostic for Non-Monotone Cache Compression
Researchers present a diagnostic framework for evaluating KV cache eviction selectors in large language models, identifying three failure modes and demonstrating that value-aware ranking combined with evidence recovery achieves 72.6% accuracy on positive-margin test cases. The work addresses a critical bottleneck in long-context LLM inference by revealing why compression strategies succeed or fail.