AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce PEAM, a parametric memory framework for AI agents in Minecraft that consolidates learned skills directly into model parameters rather than relying on retrieval-based memory. The system uses a mixture-of-experts architecture with contrastive learning to internalize both successful and failed experiences, achieving better long-horizon task performance while avoiding catastrophic forgetting.
AIBullisharXiv – CS AI · 5d ago6/10
🧠Researchers introduce POLAR, a memory-augmented framework that enables multimodal AI agents to personalize their behavior based on accumulated long-term user interactions. The system organizes past interactions into semantic and episodic memory, allowing embodied agents to interpret implicit user requests and improve task execution performance across multiple interaction cycles.
AINeutralarXiv – CS AI · 5d ago6/10
🧠Researchers introduce AgingBench, a longitudinal reliability benchmark that evaluates how AI agents degrade over time in production environments rather than just at deployment. The study reveals that agent reliability decays through four distinct mechanisms—compression, interference, revision, and maintenance aging—and that fixes must target specific failure stages rather than assuming stronger base models solve the problem.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers propose HAGE, a weighted multi-relational memory framework that improves how large language model agents retrieve and traverse information by treating memory as a dynamic graph rather than static lookups. The system uses reinforcement learning to optimize edge representations and routing behavior, achieving better long-horizon reasoning accuracy with improved efficiency compared to existing agentic memory systems.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers present a scale-conditioned evaluation protocol for AI agent memory systems that tests whether stored evidence remains usable as irrelevant data accumulates. Testing across multiple memory architectures and language models reveals that reliability degrades unpredictably with scale, with some models exceeding computational budgets while others maintain performance, suggesting memory scalability claims must be conditioned on specific agent-interface-scale combinations.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers analyzed internal mechanisms of LLM-based agent memory systems across the Qwen model family, discovering that routing circuits activate before content extraction circuits—a critical gap in small models. They developed an unsupervised diagnostic tool achieving 76.2% accuracy in identifying where silent memory failures occur, providing practical insights for improving agent reliability.
AINeutralarXiv – CS AI · May 46/10
🧠Researchers introduce MemoryBench, a new benchmark for evaluating how large language models learn and improve from accumulated user feedback over time. The framework addresses limitations in existing memory benchmarks by testing continual learning across multiple domains and languages, revealing that current state-of-the-art systems perform poorly on these tasks.
AINeutralarXiv – CS AI · May 16/10
🧠Researchers introduce RSCB-MC, a risk-sensitive contextual bandit system that improves how LLM-based coding agents decide whether to use external memory for debugging tasks. Rather than treating memory retrieval as a simple similarity-matching problem, the system treats it as a safety-critical control problem, achieving 62.5% success rate with zero false positives in testing.
AINeutralarXiv – CS AI · Apr 206/10
🧠Researchers propose the Experience Compression Spectrum, a unifying framework that reconciles two separate research communities studying LLM agent memory and skill discovery by positioning them along a single compression axis. The framework identifies a critical gap—no existing system supports adaptive cross-level compression—and reveals that memory systems and skill discovery communities operate in isolation despite solving overlapping problems.
AIBullisharXiv – CS AI · Apr 156/10
🧠Researchers introduce M★, a method that automatically evolves task-specific memory systems for large language model agents by treating memory architecture as executable Python code. The approach outperforms fixed memory designs across conversation, planning, and reasoning benchmarks, suggesting that specialized memory mechanisms significantly outperform one-size-fits-all solutions.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers identify a critical architectural gap in leading AI agent frameworks (CoALA and JEPA), which lack an explicit Knowledge layer with distinct persistence semantics. The paper proposes a four-layer decomposition model with fundamentally different update mechanics for knowledge, memory, wisdom, and intelligence, with working implementations demonstrating feasibility.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers introduce LIFESTATE-BENCH, a benchmark for evaluating lifelong learning capabilities in large language models through multi-turn interactions using narrative datasets like Hamlet. Testing shows nonparametric approaches significantly outperform parametric methods, but all models struggle with catastrophic forgetting over extended interactions, revealing fundamental limitations in LLM memory and consistency.
🧠 GPT-4🧠 Llama
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers introduce MERMAID, a memory-enhanced multi-agent framework for automated fact-checking that couples evidence retrieval with reasoning processes. The system achieves state-of-the-art performance on multiple benchmarks by reusing retrieved evidence across claims, reducing redundant searches and improving verification efficiency.
AIBullisharXiv – CS AI · Apr 76/10
🧠Researchers have developed Memory Intelligence Agent (MIA), a new AI framework that improves deep research agents through a Manager-Planner-Executor architecture with advanced memory systems. The framework enables continuous learning during inference and demonstrates superior performance across eleven benchmarks through enhanced cooperation between parametric and non-parametric memory systems.
AIBullisharXiv – CS AI · Apr 76/10
🧠Researchers have released SuperLocalMemory V3.3, an open-source AI agent memory system that operates entirely locally without cloud LLMs, implementing biologically-inspired forgetting mechanisms and multi-channel retrieval. The system achieves 70.4% performance on LoCoMo benchmarks while running on CPU only, addressing the paradox of AI agents having vast knowledge but poor conversational memory.
AIBullisharXiv – CS AI · Mar 276/10
🧠Researchers have introduced ElephantBroker, an open-source cognitive runtime system that combines knowledge graphs with vector storage to create more trustworthy AI agents with verifiable memory. The system implements comprehensive safety measures, evidence verification, and multi-organizational access controls for enterprise AI deployments.
AINeutralarXiv – CS AI · Mar 266/10
🧠Researchers introduced Enhanced Mycelium of Thought (EMoT), a bio-inspired AI reasoning framework that organizes cognitive processing into four hierarchical levels with strategic dormancy and memory encoding. The system achieved near-parity with Chain-of-Thought reasoning on complex problems but significantly underperformed on simple tasks, with 33-fold higher computational costs.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers introduced NS-Mem, a neuro-symbolic memory framework that combines neural representations with symbolic structures to improve multimodal AI agent reasoning. The system achieved 4.35% average improvement in reasoning accuracy over pure neural systems, with up to 12.5% gains on constrained reasoning tasks.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers introduce CLAG, a clustering-based memory framework that helps small language model agents organize and retrieve information more effectively. The system addresses memory dilution issues by creating semantic clusters with automated profiles, showing improved performance across multiple QA datasets.
AIBullisharXiv – CS AI · Mar 126/10
🧠Researchers introduce a new framework for AI agent systems that automatically extracts learnings from execution trajectories to improve future performance. The system uses four components including trajectory analysis and contextual memory retrieval, achieving up to 14.3 percentage point improvements in task completion on benchmarks.
AIBullisharXiv – CS AI · Mar 116/10
🧠Researchers introduce Semantic Level of Detail (SLoD), a framework for AI memory systems that uses heat kernel diffusion on hyperbolic manifolds to enable continuous resolution control in knowledge graphs. The method automatically detects meaningful abstraction levels without manual parameters, achieving perfect recovery on synthetic hierarchies and strong alignment with real-world taxonomies like WordNet.
AINeutralarXiv – CS AI · Mar 45/103
🧠Researchers developed V-GEMS, a new multimodal AI agent architecture that improves web navigation by combining visual grounding with explicit memory systems. The system achieved a 28.7% performance improvement over existing baselines by preventing navigation loops and enabling better backtracking through structured path mapping.
AIBullisharXiv – CS AI · Mar 45/102
🧠Researchers introduce MultiSessionCollab, a benchmark for evaluating conversational AI agents' ability to learn and adapt to user preferences across multiple collaboration sessions. The study demonstrates that equipping agents with persistent memory significantly improves long-term collaboration quality, task success rates, and user experience.
AIBullisharXiv – CS AI · Mar 36/109
🧠Researchers introduce GAM-RAG, a training-free framework that improves Retrieval-Augmented Generation by building adaptive memory from past queries instead of relying on static indices. The system uses uncertainty-aware updates inspired by cognitive neuroscience to balance stability and adaptability, achieving 3.95% better performance while reducing inference costs by 61%.
AINeutralarXiv – CS AI · Mar 36/1010
🧠Researchers introduce ATM-Bench, the first benchmark for evaluating AI assistants' ability to recall and reason over long-term personalized memory across multiple modalities. The benchmark reveals poor performance (under 20% accuracy) for current state-of-the-art memory systems, highlighting significant limitations in personalized AI capabilities.