#context-management News & Analysis

23 articles tagged with #context-management. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

23 articles

AIBearisharXiv – CS AI · Jun 237/10

🧠

Plans Don't Persist: Why Context Management Is Load Bearing for LLM Agents

Researchers demonstrate that large language model agents fail to maintain plans as persistent internal state, instead relying on plans remaining in the context window. Using diagnostic techniques on Llama-3.1-70B and DeepSeek-R1, the study shows plan signal decays rapidly when compressed out of context, with practical implications for agent reliability in long-horizon tasks.

🧠 Llama

AIBullisharXiv – CS AI · Jun 107/10

🧠

Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents

Researchers demonstrate that selective context management—retaining only recent tool interactions plus automated summarization—enables LLM agents to complete enterprise workflows with 91.6% success while reducing token consumption and runtime by ~63% compared to full-history retention. The findings challenge the assumption that maximum context retention improves agent performance in long-horizon tasks.

🧠 GPT-5🧠 Claude🧠 Sonnet

AIBullisharXiv – CS AI · May 297/10

🧠

Indexing the Unreadable: LLM-Native Recursive Construction and Search of Service Taxonomies

Researchers propose A2X, an LLM-native service discovery system that organizes thousands of callable services into hierarchical taxonomies to solve the context-window limitation problem facing AI agents. The approach achieves 20+ point improvements in retrieval accuracy while reducing token consumption to one-ninth compared to baseline methods, enabling scalable orchestration of distributed services.

AINeutralarXiv – CS AI · May 297/10

🧠

When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

Researchers introduce BeliefTrack, a benchmark for evaluating how large language models manage contextual information over long interactions—deciding when to update beliefs, preserve state, or ignore noise. The study reveals vanilla LLMs fail significantly at this task, while reinforcement learning with belief-state rewards reduces failures by 71% on average.

AIBullisharXiv – CS AI · May 297/10

🧠

SCOPE: Prompt Evolution for Enhancing Agent Effectiveness

Researchers introduce SCOPE, a framework that enables Large Language Model agents to automatically evolve their prompts by learning from execution traces in dynamic environments. The system improves task success rates from 14.23% to 38.64% on benchmark tests, addressing a critical limitation in how LLM agents manage complex, changing contexts without human intervention.

AIBullisharXiv – CS AI · May 117/10

🧠

The Context Gathering Decision Process: A POMDP Framework for Agentic Search

Researchers introduce the Context Gathering Decision Process (CGDP), a POMDP framework that formalizes how LLM agents should search and gather information from environments exceeding their context windows. The approach yields measurable improvements in multi-hop reasoning (up to 11.4%) and token efficiency (up to 39% savings) through explicit belief state management and programmatic exhaustion detection.

AIBullisharXiv – CS AI · May 77/10

🧠

LCM: Lossless Context Management

Researchers introduce Lossless Context Management (LCM), a deterministic architecture for LLM memory that outperforms Claude Code on long-context tasks up to 1M tokens. LCM combines recursive context compression with engine-managed task partitioning, representing an evolution of recursive language models that prioritizes reliability and state retrievability over flexibility.

🧠 Claude🧠 Opus

AINeutralarXiv – CS AI · Apr 107/10

🧠

ATANT: An Evaluation Framework for AI Continuity

Researchers introduce ATANT, an open evaluation framework designed to measure whether AI systems can maintain coherent context and continuity across time without confusing information across different narratives. The framework achieves up to 100% accuracy in isolated scenarios but drops to 96% when managing 250 simultaneous narratives, revealing practical limitations in current AI memory architectures.

AINeutralarXiv – CS AI · Jun 106/10

🧠

HIPIF: Hierarchical Planning and Information Folding for Long-Horizon LLM Agent Learning

Researchers propose HIPIF, a novel training method that improves Large Language Model agents' performance on complex multi-step tasks by organizing execution around explicit subgoals and summarizing completed progress to reduce interference from growing context. The approach combines hierarchical planning with reward mechanisms, demonstrating improvements on three public benchmarks without requiring costly auxiliary models.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

Researchers introduce OSL-MR, a framework that optimizes memory retention for long-horizon language agents by treating it as a constrained optimization problem rather than local decisions. The approach combines learned evidence valuation with heuristic scoring while respecting real-world observability constraints, demonstrating superior performance over existing methods on benchmark datasets.

AIBullisharXiv – CS AI · Jun 96/10

🧠

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

Researchers present SearchSwarm, a framework that trains large language models to intelligently delegate complex tasks to subagents while managing finite context windows. The resulting 30B-parameter model achieves state-of-the-art performance on research benchmarks by learning when and what to delegate, addressing a critical bottleneck in agentic AI systems.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Context Rot in AI-Assisted Software Development: Repurposing Documentation Consistency for AI Configuration Artifacts

Researchers identify 'context rot'—the degradation of AI configuration files that guide coding assistants—as a significant problem affecting 23% of repositories studied. The study proposes adapting decades-old documentation consistency tools to detect stale context in AI artifacts like CLAUDE.md and .cursorrules files, establishing a research framework for maintaining AI tool guidance accuracy.

AIBullisharXiv – CS AI · Jun 96/10

🧠

DYCP: Dynamic Context Pruning for Long-Form Dialogue with LLMs

Researchers introduce DyCP, a lightweight context management system that dynamically selects relevant dialogue segments for long-form conversations with large language models, improving inference efficiency without offline preprocessing. The method demonstrates competitive performance across multiple LLM benchmarks while reducing computational costs and latency in real-world dialogue applications.

AINeutralarXiv – CS AI · Jun 36/10

🧠

Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks

Researchers introduce 'handoff debt,' a framework measuring the efficiency cost when coding agents resume interrupted tasks from incomplete states. Testing across 75 tasks and 724 takeover runs, they found that providing context-bearing handoff information (traces, notes, structured documentation) reduces agent event counts by 20-59% and token consumption by 42-63% compared to repository-only takeover, suggesting current agent benchmarks underestimate real-world deployment costs.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

Researchers systematically studied how masking outdated information improves long-horizon search agents' efficiency, finding that benefits follow an inverted-U pattern dependent on model capacity and retriever quality. The effect collapses when models become saturated, revealing that context management success depends on balancing retriever performance with a model's implicit filtering capacity rather than either factor alone.

AIBullisharXiv – CS AI · Jun 16/10

🧠

Learning Agent-Compatible Context Management for Long-Horizon Tasks

Researchers introduce Adaptive Context Management (AdaCoM), an external LLM-based system that optimizes how AI agents handle long-context tasks by learning agent-specific compression strategies through reinforcement learning. The approach improves performance on web search and research benchmarks while avoiding the need to retrain frozen agents, revealing that high-performing agents benefit from preserving context fidelity while weaker agents need more aggressive compression.

AIBullisharXiv – CS AI · May 296/10

🧠

Enhancing Multi-Agent Communication through Attention Steering with Context Relevance

Researchers introduce Agent-Radar, a training-free context management method that improves multi-agent LLM systems by dynamically filtering irrelevant information from long conversation histories. The technique uses temporal and spatial decay mechanisms to maintain focus on relevant context, achieving up to 7.64% performance improvements across five benchmarks.

AIBullisharXiv – CS AI · May 296/10

🧠

Loong: A Human-Like Long Document Translation Agent with Observe-and-Act Adaptive Context Selection

Researchers introduce Loong, an AI agent designed to improve long document translation by selectively retrieving relevant context from a 3E memory module rather than processing all available information. The system uses reinforcement learning to optimize context selection and demonstrates significant translation quality improvements across multiple language pairs, achieving gains up to 13 points on standard evaluation metrics.

AINeutralarXiv – CS AI · May 286/10

🧠

Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management

A new arXiv paper challenges the widespread claim that Transformers are Turing-complete, arguing that existing proofs conflate two distinct computational settings. The research clarifies that real-world LLM deployment operates under fixed-system constraints where context management critically determines actual computational power, rather than the idealized scaling-family setting used in most theoretical proofs.

AIBullisharXiv – CS AI · May 116/10

🧠

AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management

AgentProg introduces a novel program-guided context management system for long-horizon GUI agents that addresses the critical bottleneck of expanding interaction history overhead. By reframing interaction history as structured programs with variables and control flow, the approach preserves semantic information while reducing context requirements, achieving state-of-the-art performance on AndroidWorld benchmarks while maintaining robustness on extended tasks.

AIBullisharXiv – CS AI · May 116/10

🧠

MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning

Researchers introduce MemSearcher, an AI agent framework that optimizes how large language models handle multi-turn interactions by maintaining compact memory instead of concatenating full conversation history. The approach uses a novel multi-context GRPO training method and demonstrates superior performance while maintaining stable token counts, reducing computational overhead.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Mixed-Initiative Context: Structuring and Managing Context for Human-AI Collaboration

Researchers propose Mixed-Initiative Context, a framework that reconceptualizes how multi-turn AI interactions are managed by treating context as an explicit, structured, and dynamically adjustable object rather than a fixed chronological sequence. The approach enables both humans and AI to actively participate in context construction, addressing current limitations where irrelevant exchanges clutter context windows and users lack direct control mechanisms.

AIBullisharXiv – CS AI · Mar 66/10

🧠

Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned

Researchers have developed OPENDEV, an open-source command-line AI coding agent that operates directly in terminal environments where developers manage source control and deployments. The system uses a compound AI architecture with dual-agent design, specialized model routing, and adaptive context management to provide autonomous coding assistance while maintaining safety controls.