AIBullisharXiv – CS AI · Jun 57/10
🧠Researchers introduce AdaMEM, a test-time adaptive memory framework that enables language agents to dynamically adjust behavior during inference without updating model parameters. The system combines persistent offline trajectory memory with dynamically generated on-the-fly strategy memory, demonstrating 11-13% performance improvements on complex reasoning and web interaction tasks.
AIBullisharXiv – CS AI · Jun 57/10
🧠Researchers present CVT-RL, a reinforcement learning algorithm that addresses the problem of long-horizon language agents learning shortcuts and unsupported reasoning chains by introducing policy-conditioned counterfactual credit estimation and intervention-validity gating. The method achieves 78.9% task success and reduces measured hacking attempts from 7.2% to 3.9%, demonstrating measurable improvements in agent reliability and verifiability.
AIBullisharXiv – CS AI · May 127/10
🧠Researchers introduce CIVeX, a causal intervention verifier that validates whether tool-calling language agents' proposed actions will actually produce intended effects in real-world execution. The system achieves zero false executions under adversarial conditions and outperforms LLM-based verification approaches by ensuring causal identifiability rather than just schema validity.
🧠 Claude
AIBullisharXiv – CS AI · May 127/10
🧠Researchers propose DeMem, a decision-centric memory framework that optimizes agent memory allocation based on preserving distinctions needed for sound decision-making rather than descriptive accuracy. Using rate-distortion theory, the approach identifies what information can be safely forgotten under memory constraints and demonstrates performance gains on long-horizon language agent tasks.
AIBullisharXiv – CS AI · May 97/10
🧠Researchers introduce BEACON, a milestone-guided policy learning framework that significantly improves training efficiency for long-horizon language agents by solving credit misattribution and sample inefficiency problems. The approach achieves 92.9% success rates on complex tasks—nearly double previous benchmarks—while improving sample utilization from 23.7% to 82.0%.
AINeutralarXiv – CS AI · Jun 116/10
🧠Researchers propose ACTION-RATING, a framework enabling hierarchical AI agents to recognize uncertainty and request clarification as a direct action competing with navigation decisions. Testing on a 30,000-node taxonomy shows information-seeking effectiveness rising from 50% to 74% as agents shift from mandatory to opportunistic clarification modes, with accuracy gains up to 16.2%.
AINeutralarXiv – CS AI · Jun 106/10
🧠Researchers demonstrate that spatial memory systems for language agents must fundamentally separate memory recall from visibility computation, using occlusion testing as a validation method. The study shows that geometry-based weighting outperforms traditional blending approaches, and introduces a ray-casting technique to properly handle occluded spatial information.
AINeutralarXiv – CS AI · Jun 106/10
🧠Researchers introduce OSL-MR, a framework that optimizes memory retention for long-horizon language agents by treating it as a constrained optimization problem rather than local decisions. The approach combines learned evidence valuation with heuristic scoring while respecting real-world observability constraints, demonstrating superior performance over existing methods on benchmark datasets.
AIBullisharXiv – CS AI · Jun 46/10
🧠Researchers introduce State-Grounded Dynamic Retrieval (SGDR), a new method enabling language agents to dynamically reuse learned skills during web automation tasks. By matching skills to both task goals and current webpage states rather than fixed skill sets, SGDR achieves 10.6% relative performance gains over existing approaches on complex multi-step web tasks.
🧠 GPT-4
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers introduce AgentCL, an evaluation framework for assessing continual learning in language agents, along with MemProbe, a memory design method that helps agents accumulate and reuse knowledge across tasks while avoiding interference. The framework uses controlled task streams to rigorously measure how well agents learn and transfer knowledge over time, revealing that current memory designs struggle to balance learning plasticity with stable knowledge reuse.
AIBullisharXiv – CS AI · Jun 26/10
🧠Researchers propose PaW, a co-training framework that enhances language model agents by simultaneously optimizing reinforcement learning policies and world models using data from standard RL rollouts. The approach eliminates the need for separate simulators or training stages while demonstrating consistent improvements across multiple benchmarks.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers demonstrate that unpredictability in language agents does not equate to effective control, finding that structured decision-making mechanisms significantly outperform stochastic sampling across 74,352 test cases. The study challenges assumptions about randomness and control in AI systems, with implications for agent reliability and interpretability.
AIBullisharXiv – CS AI · Mar 36/105
🧠Researchers have developed REMem, a new framework that enables AI language agents to form and reason with episodic memory similar to humans. The system uses a two-phase approach with offline memory graph indexing and online agentic retrieval, showing significant improvements over existing memory systems like Mem0 and HippoRAG 2.
AINeutralarXiv – CS AI · Feb 275/102
🧠Researchers propose using cognitive models and AI algorithms as templates for designing modular language agents that combine multiple large language models. The position paper formalizes agent templates that specify roles for individual LLMs and how their functionalities should be composed to solve complex problems beyond single model capabilities.