Learning What Not to Forget: Long-Horizon Agent Memory from a Few Kilobytes of Learning
Researchers present LRE (Learned Relevance Eviction), a lightweight memory management system for long-running language model agents that intelligently decides which historical information to retain when context windows fill up. The approach uses a small, CPU-based scorer to identify critical details like access tokens and task-relevant information, achieving comparable accuracy to keeping full history while reducing peak context size by up to 52% and requiring significantly fewer computational calls.
The challenge of managing context windows in deployed language model systems has become increasingly critical as agents handle longer interactions. LRE addresses a fundamental operational problem: when systems accumulate interaction history exceeding token limits, they must decide what to discard. Current approaches either keep everything (computationally expensive) or use generic pruning strategies that risk losing load-bearing details, causing downstream task failures. This research demonstrates that learned relevance scoring can solve this fidelity problem more efficiently than existing alternatives.
The technical contribution centers on a parameter-efficient scorer trained to predict which historical units matter for future operations. By operating CPU-only without neural compression calls, LRE maintains practical deployability across resource-constrained environments. The experimental results show substantial improvements: on basic tasks, LRE exceeds the no-eviction baseline by 27% while reducing computational overhead, and on complex agent workflows, it matches full-history accuracy while completing tasks in 37% fewer calls. The annotation-free training variant achieving 95% effectiveness of supervised performance indicates the approach generalizes well from system behavior alone.
For the AI infrastructure space, this work has practical implications. Long-horizon agents underpin emerging applications in autonomous task completion, multi-step reasoning, and persistent conversation systems. Efficient memory management directly impacts deployment feasibility and operating costs. The research suggests that sophisticated learned policies outperform both naive retention and expensive neural compression, creating opportunities for more capable yet cost-effective agent systems. Development teams building production agents face immediate decisions about memory strategies, and this approach offers empirical validation of learned relevance as a viable path forward without requiring large language models for the eviction decision itself.
- βLRE matches full-history accuracy on agent tasks while reducing peak context size by 52% using only kilobytes of learned parameters
- βThe CPU-only scorer operates without neural compression calls, making it deployable in resource-constrained environments
- βAnnotation-free training on system behavior alone recovers 95% of supervised performance, improving practical applicability
- βOn standardized reading comprehension tasks, LRE achieves best budgeted answer quality while reading 68% fewer tokens than baselines
- βMemory eviction in LLM systems is fundamentally a fidelity problem requiring proactive policies when future queries are unavailable