Beyond Semantic Organization: Memory as Execution State Management for Long-Horizon Agents
Researchers introduce MAGE, a novel memory management system for LLM-based agents that organizes task histories as hierarchical state trees rather than semantic similarity clusters. The approach achieves 7.8-20.4 percentage point improvements in task success rates while reducing token consumption by 55.1% on long-horizon tasks with interdependent decisions.
The paper addresses a fundamental architectural problem in current LLM agent systems: existing memory approaches organize information by semantic relevance rather than execution state, creating fragmentation in decision trajectories and mixing valid with erroneous execution paths. This mismatch becomes increasingly problematic as agents tackle complex, long-horizon tasks where each action constrains future possibilities and cascading errors compound. MAGE's state-tree approach represents a meaningful shift toward execution-aware memory design, treating agent memory as active state management rather than passive retrieval.
The technical innovation lies in four coupled operations that maintain memory integrity: Grow captures new interactions, Compress summarizes completed subgoals, Maintain validates summaries, and Revise enables branch exploration. By anchoring agent state to an active root-to-current path within a hierarchical tree, MAGE preserves decision context while isolating flawed segments. This design naturally bounds context growth—a critical practical constraint for token-limited systems—while maintaining sufficient information for coherent planning.
The 55.1% reduction in token consumption has direct implications for deployment efficiency and cost. For developers building production LLM agents, this addresses a significant pain point: long-horizon tasks rapidly exhaust context windows. The improvements in task success rates suggest MAGE meaningfully enhances agent reliability, not just efficiency. These gains appear robust across experimental settings, indicating practical applicability beyond benchmark demonstrations.
The research opens questions about integrating MAGE with different LLM architectures and scaling to more complex multi-agent scenarios. Future work exploring hybrid approaches combining execution-state and semantic organization could further optimize performance.
- →MAGE improves task success rates by 7.8-20.4 percentage points compared to semantic-based memory systems.
- →Hierarchical state-tree architecture separates valid execution paths from erroneous branches, improving error isolation.
- →Token consumption reduction of 55.1% addresses a critical efficiency bottleneck in long-horizon agent deployment.
- →Active memory management approach treats agent memory as state reconstruction rather than passive information retrieval.
- →Four coupled operations (Grow, Compress, Maintain, Revise) maintain context integrity while bounding information growth.