AIBullisharXiv – CS AI · 4d ago7/10
🧠Researchers demonstrate that selective context management—retaining only recent tool interactions plus automated summarization—enables LLM agents to complete enterprise workflows with 91.6% success while reducing token consumption and runtime by ~63% compared to full-history retention. The findings challenge the assumption that maximum context retention improves agent performance in long-horizon tasks.
🧠 GPT-5🧠 Claude🧠 Sonnet
AIBullisharXiv – CS AI · May 297/10
🧠Researchers propose A2X, an LLM-native service discovery system that organizes thousands of callable services into hierarchical taxonomies to solve the context-window limitation problem facing AI agents. The approach achieves 20+ point improvements in retrieval accuracy while reducing token consumption to one-ninth compared to baseline methods, enabling scalable orchestration of distributed services.
AINeutralarXiv – CS AI · May 297/10
🧠Researchers introduce BeliefTrack, a benchmark for evaluating how large language models manage contextual information over long interactions—deciding when to update beliefs, preserve state, or ignore noise. The study reveals vanilla LLMs fail significantly at this task, while reinforcement learning with belief-state rewards reduces failures by 71% on average.
AIBullisharXiv – CS AI · May 297/10
🧠Researchers introduce SCOPE, a framework that enables Large Language Model agents to automatically evolve their prompts by learning from execution traces in dynamic environments. The system improves task success rates from 14.23% to 38.64% on benchmark tests, addressing a critical limitation in how LLM agents manage complex, changing contexts without human intervention.
AIBullisharXiv – CS AI · May 117/10
🧠Researchers introduce the Context Gathering Decision Process (CGDP), a POMDP framework that formalizes how LLM agents should search and gather information from environments exceeding their context windows. The approach yields measurable improvements in multi-hop reasoning (up to 11.4%) and token efficiency (up to 39% savings) through explicit belief state management and programmatic exhaustion detection.
AIBullisharXiv – CS AI · May 77/10
🧠Researchers introduce Lossless Context Management (LCM), a deterministic architecture for LLM memory that outperforms Claude Code on long-context tasks up to 1M tokens. LCM combines recursive context compression with engine-managed task partitioning, representing an evolution of recursive language models that prioritizes reliability and state retrievability over flexibility.
🧠 Claude🧠 Opus
AINeutralarXiv – CS AI · Apr 107/10
🧠Researchers introduce ATANT, an open evaluation framework designed to measure whether AI systems can maintain coherent context and continuity across time without confusing information across different narratives. The framework achieves up to 100% accuracy in isolated scenarios but drops to 96% when managing 250 simultaneous narratives, revealing practical limitations in current AI memory architectures.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers propose HIPIF, a novel training method that improves Large Language Model agents' performance on complex multi-step tasks by organizing execution around explicit subgoals and summarizing completed progress to reduce interference from growing context. The approach combines hierarchical planning with reward mechanisms, demonstrating improvements on three public benchmarks without requiring costly auxiliary models.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce OSL-MR, a framework that optimizes memory retention for long-horizon language agents by treating it as a constrained optimization problem rather than local decisions. The approach combines learned evidence valuation with heuristic scoring while respecting real-world observability constraints, demonstrating superior performance over existing methods on benchmark datasets.
AIBullisharXiv – CS AI · 5d ago6/10
🧠Researchers present SearchSwarm, a framework that trains large language models to intelligently delegate complex tasks to subagents while managing finite context windows. The resulting 30B-parameter model achieves state-of-the-art performance on research benchmarks by learning when and what to delegate, addressing a critical bottleneck in agentic AI systems.
AINeutralarXiv – CS AI · 5d ago6/10
🧠Researchers identify 'context rot'—the degradation of AI configuration files that guide coding assistants—as a significant problem affecting 23% of repositories studied. The study proposes adapting decades-old documentation consistency tools to detect stale context in AI artifacts like CLAUDE.md and .cursorrules files, establishing a research framework for maintaining AI tool guidance accuracy.
AIBullisharXiv – CS AI · 5d ago6/10
🧠Researchers introduce DyCP, a lightweight context management system that dynamically selects relevant dialogue segments for long-form conversations with large language models, improving inference efficiency without offline preprocessing. The method demonstrates competitive performance across multiple LLM benchmarks while reducing computational costs and latency in real-world dialogue applications.
AINeutralarXiv – CS AI · Jun 36/10
🧠Researchers introduce 'handoff debt,' a framework measuring the efficiency cost when coding agents resume interrupted tasks from incomplete states. Testing across 75 tasks and 724 takeover runs, they found that providing context-bearing handoff information (traces, notes, structured documentation) reduces agent event counts by 20-59% and token consumption by 42-63% compared to repository-only takeover, suggesting current agent benchmarks underestimate real-world deployment costs.
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers systematically studied how masking outdated information improves long-horizon search agents' efficiency, finding that benefits follow an inverted-U pattern dependent on model capacity and retriever quality. The effect collapses when models become saturated, revealing that context management success depends on balancing retriever performance with a model's implicit filtering capacity rather than either factor alone.
AIBullisharXiv – CS AI · Jun 16/10
🧠Researchers introduce Adaptive Context Management (AdaCoM), an external LLM-based system that optimizes how AI agents handle long-context tasks by learning agent-specific compression strategies through reinforcement learning. The approach improves performance on web search and research benchmarks while avoiding the need to retrain frozen agents, revealing that high-performing agents benefit from preserving context fidelity while weaker agents need more aggressive compression.
AIBullisharXiv – CS AI · May 296/10
🧠Researchers introduce Agent-Radar, a training-free context management method that improves multi-agent LLM systems by dynamically filtering irrelevant information from long conversation histories. The technique uses temporal and spatial decay mechanisms to maintain focus on relevant context, achieving up to 7.64% performance improvements across five benchmarks.
AIBullisharXiv – CS AI · May 296/10
🧠Researchers introduce Loong, an AI agent designed to improve long document translation by selectively retrieving relevant context from a 3E memory module rather than processing all available information. The system uses reinforcement learning to optimize context selection and demonstrates significant translation quality improvements across multiple language pairs, achieving gains up to 13 points on standard evaluation metrics.
AINeutralarXiv – CS AI · May 286/10
🧠A new arXiv paper challenges the widespread claim that Transformers are Turing-complete, arguing that existing proofs conflate two distinct computational settings. The research clarifies that real-world LLM deployment operates under fixed-system constraints where context management critically determines actual computational power, rather than the idealized scaling-family setting used in most theoretical proofs.
AIBullisharXiv – CS AI · May 116/10
🧠AgentProg introduces a novel program-guided context management system for long-horizon GUI agents that addresses the critical bottleneck of expanding interaction history overhead. By reframing interaction history as structured programs with variables and control flow, the approach preserves semantic information while reducing context requirements, achieving state-of-the-art performance on AndroidWorld benchmarks while maintaining robustness on extended tasks.
AIBullisharXiv – CS AI · May 116/10
🧠Researchers introduce MemSearcher, an AI agent framework that optimizes how large language models handle multi-turn interactions by maintaining compact memory instead of concatenating full conversation history. The approach uses a novel multi-context GRPO training method and demonstrates superior performance while maintaining stable token counts, reducing computational overhead.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers propose Mixed-Initiative Context, a framework that reconceptualizes how multi-turn AI interactions are managed by treating context as an explicit, structured, and dynamically adjustable object rather than a fixed chronological sequence. The approach enables both humans and AI to actively participate in context construction, addressing current limitations where irrelevant exchanges clutter context windows and users lack direct control mechanisms.
AIBullisharXiv – CS AI · Mar 66/10
🧠Researchers have developed OPENDEV, an open-source command-line AI coding agent that operates directly in terminal environments where developers manage source control and deployments. The system uses a compound AI architecture with dual-agent design, specialized model routing, and adaptive context management to provide autonomous coding assistance while maintaining safety controls.