AIBullisharXiv – CS AI · 18h ago7/10
🧠
End-to-End Context Compression at Scale
Researchers introduce Latent Context Language Models (LCLMs), a new encoder-decoder compression approach that addresses memory bottlenecks in long-context language model inference. By compressing KV caches at ratios of 1:4 to 1:16 while maintaining model quality, LCLMs enable faster processing of extended contexts and support adaptive expansion for long-horizon agent applications.