AINeutralarXiv – CS AI · 9h ago6/10
🧠
KV Cache Offloading for Context-Intensive Tasks
Researchers demonstrate that KV-cache offloading techniques, designed to reduce memory usage in large language models, significantly degrade performance on context-intensive tasks requiring extensive information extraction. The study introduces the Text2JSON benchmark and identifies low-rank projection and unreliable landmarks as key failure points, proposing improved alternatives.
🧠 Llama