The Context Gathering Decision Process: A POMDP Framework for Agentic Search
Researchers introduce the Context Gathering Decision Process (CGDP), a POMDP framework that formalizes how LLM agents should search and gather information from environments exceeding their context windows. The approach yields measurable improvements in multi-hop reasoning (up to 11.4%) and token efficiency (up to 39% savings) through explicit belief state management and programmatic exhaustion detection.
This research addresses a fundamental bottleneck in deploying LLM agents at scale: the inability to effectively search through information spaces larger than model context windows without degrading into redundant or inefficient behavior. The paper's contribution lies in formalizing what has previously been implicit agent behavior through the lens of Partially Observable Markov Decision Processes, providing a principled mathematical framework for understanding and improving agentic search.
The CGDP framework emerges from a practical need in production systems. As LLM agents interact with codebases, databases, and conversation histories, their working memory degrades without explicit infrastructure to track state. This leads to wasted computation through repetitive loops and premature task abandonment. The researchers model LLM behavior as approximate Thompson Sampling, then derive two concrete interventions: a persistent, predicate-based belief state and a programmatic exhaustion gate.
The empirical validation across multiple question-answering domains demonstrates real-world applicability. Improvements of 11.4% in multi-hop reasoning indicate the framework successfully helps agents maintain coherent mental models across multiple reasoning steps. The 39% token savings without performance degradation is particularly significant for cost-sensitive production deployments, where inference expenses scale with context and computation.
For the broader AI infrastructure ecosystem, this work suggests that agentic systems benefit from explicit state management borrowed from classical AI and control theory. Rather than viewing LLMs as black boxes, the framework enables modular, non-interfering optimizations that preserve agent autonomy while constraining inefficiency. This bridges machine learning and traditional planning literature.
- βCGDP framework formalizes LLM agent search as a POMDP, improving multi-hop reasoning by up to 11.4%
- βPredicate-based belief state preserves reasoning capability while bounding context window usage
- βProgrammatic exhaustion detection reduces token consumption by 39% without degrading task performance
- βApproach modularizes implicit agent behavior into explicit, interpretable operations
- βFramework applicable across multiple domains and agent architectures without requiring model retraining