Cooperative Memory Paging with Keyword Bookmarks for Long-Horizon LLM Conversations
Researchers propose cooperative paging, a method for managing long LLM conversations by replacing evicted context with compact keyword bookmarks and providing a recall tool for on-demand retrieval. The technique outperforms existing solutions on the LoCoMo benchmark across multiple models, though bookmark discrimination remains a critical limitation.
Cooperative paging addresses a fundamental constraint in large language model applications: the finite context window that limits conversation length. As LLMs process increasingly long dialogues, earlier content must be removed to stay within token limits, yet models still need access to that information when relevant. This research demonstrates that minimal keyword summaries (~8-24 tokens) can effectively replace full context segments while enabling the model to selectively retrieve complete information through a dedicated tool.
The technical approach reflects broader efforts to extend LLM capabilities beyond architectural limits. Previous solutions relied on truncation, dense retrieval systems, or maintaining full context—each with tradeoffs between memory efficiency and answer quality. Cooperative paging achieves superior performance on standardized benchmarks by balancing storage efficiency with selective retrieval, suggesting a pragmatic middle ground.
The findings reveal nuanced design considerations: fixed-size pages consistently outperform content-aware boundary detection, suggesting simple heuristics often trump sophisticated segmentation. The critical bottleneck identified—bookmark discrimination—has direct practical implications: when the model cannot distinguish between bookmarks, it wastes tokens retrieving irrelevant pages, undermining the system's efficiency gains. The 25-percentage-point accuracy gap tied to keyword specificity suggests marginal improvements in bookmark generation could yield substantial performance gains.
For developers building production conversational systems, this research indicates that context management strategies must be benchmarked against real-world conversation patterns, as synthetic data shows different optimal policies. The work validates that minimal information retrieval systems can outperform comprehensive context retention in practical scenarios, potentially reducing computational costs while improving response quality.
- →Cooperative paging with keyword bookmarks achieves higher answer quality than full-context baselines on multi-turn LLM conversations.
- →Fixed-size pages substantially outperform content-aware segmentation strategies, suggesting simpler design often works better in practice.
- →Bookmark discrimination accuracy directly correlates with system performance, with keyword specificity accounting for a 25-point accuracy difference.
- →Optimal eviction policies are data-dependent, requiring different strategies for synthetic versus real conversation patterns.
- →The method reduces context overhead while maintaining retrieval capability, promising more efficient long-horizon conversational systems.