AINeutralarXiv – CS AI · May 126/10
🧠Researchers present MCP-Cosmos, a framework integrating World Models into the Model Context Protocol ecosystem to enhance LLM agent planning and execution. The approach demonstrates measurable improvements in tool success rates and parameter accuracy across multiple benchmark tasks by enabling agents to simulate outcomes before taking actions.
AIBullisharXiv – CS AI · May 116/10
🧠Researchers introduce Group of Skills (GoSkills), a new method for organizing and retrieving skills in AI agent libraries that presents skills as structured execution contexts rather than flat lists. The approach improves agent performance on benchmark tasks while maintaining efficiency and doesn't require changes to existing agent systems.
AINeutralarXiv – CS AI · May 116/10
🧠Region4Web introduces a novel framework that reorganizes how AI web agents perceive and process web pages by shifting from element-level to functional region-level observation granularity. The approach, validated on WebArena benchmark, reduces observation length while improving task success rates across multiple LLM models, demonstrating that hierarchical abstraction of page structure yields more efficient agent performance.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers present Experience-RAG Skill, an agent-oriented system that dynamically selects optimal retrieval strategies based on task context, rather than using a single fixed pipeline. The system achieves competitive performance across diverse question-answering tasks by leveraging experience memory to orchestrate retrieval, demonstrating that strategy selection can be implemented as a reusable agent component.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers present MEMSAD, a defense mechanism against memory poisoning attacks on retrieval-augmented LLM agents, using gradient-coupled anomaly detection to identify adversarial perturbations while maintaining retrieval performance. The work formalizes security vulnerabilities in persistent external memory systems and demonstrates that while composite defenses achieve perfect detection rates, synonym-based attacks remain undetectable by embedding-based approaches.
AINeutralarXiv – CS AI · Apr 206/10
🧠Researchers propose the Experience Compression Spectrum, a unifying framework that reconciles two separate research communities studying LLM agent memory and skill discovery by positioning them along a single compression axis. The framework identifies a critical gap—no existing system supports adaptive cross-level compression—and reveals that memory systems and skill discovery communities operate in isolation despite solving overlapping problems.
AINeutralarXiv – CS AI · Apr 206/10
🧠A comprehensive survey examines how Large Language Models can be effectively integrated with graph-based data structures to improve reasoning, retrieval, and decision-making across domains. The research categorizes integration approaches by purpose, graph type, and strategy, providing practitioners with guidance on selecting appropriate techniques for specific applications in healthcare, finance, robotics, and other fields.
AINeutralarXiv – CS AI · Apr 206/10
🧠Researchers introduce GTA-2, a hierarchical benchmark that evaluates AI agents on both atomic tool-use tasks and complex, open-ended workflows using real user queries and deployed tools. The study reveals a significant capability cliff where frontier AI models achieve below 50% success on atomic tasks and only 14.39% on realistic workflows, highlighting that execution framework design matters as much as underlying model capacity.
AINeutralarXiv – CS AI · Apr 156/10
🧠Researchers investigated whether self-monitoring mechanisms (metacognition, self-prediction, duration estimation) improve reinforcement learning agents in predator-prey environments. Initial auxiliary-loss implementations provided no benefits, but structurally integrating these modules into decision pathways showed modest improvements, suggesting effective AI enhancement requires architectural embedding rather than add-on approaches.
AINeutralarXiv – CS AI · Apr 156/10
🧠Researchers demonstrate that large language models develop attractor-like geometric patterns in their activation space when processing identity documents describing persistent agents. Experiments on Llama 3.1 and Gemma 2 show paraphrased identity descriptions cluster significantly tighter than structural controls, suggesting LLMs encode semantic agent identity as stable attractors independent of linguistic variation.
🧠 Llama
AIBullisharXiv – CS AI · Apr 156/10
🧠Researchers introduce M★, a method that automatically evolves task-specific memory systems for large language model agents by treating memory architecture as executable Python code. The approach outperforms fixed memory designs across conversation, planning, and reasoning benchmarks, suggesting that specialized memory mechanisms significantly outperform one-size-fits-all solutions.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers propose SGH (Structured Graph Harness), a framework that replaces iterative Agent Loops with explicit directed acyclic graphs (DAGs) for LLM agent execution. The approach addresses structural weaknesses in current agent design by enforcing immutable execution plans, separating planning from recovery, and implementing strict escalation protocols, trading some flexibility for improved controllability and verifiability.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers introduce a declarative runtime protocol that externalizes agent state to measure how much of an LLM-based agent's competence actually derives from the language model versus explicit structural components. Testing on Collaborative Battleship, they find that explicit world-model planning drives most performance gains, while sparse LLM-based revision at 4.3% of turns yields minimal and sometimes negative returns.
AIBullisharXiv – CS AI · Mar 166/10
🧠Researchers introduce a formal planning framework that maps LLM-based web agents to traditional search algorithms, enabling better diagnosis of failures in autonomous web tasks. The study compares different agent architectures using novel evaluation metrics and a dataset of 794 human-labeled trajectories from WebArena benchmark.
AINeutralarXiv – CS AI · Mar 96/10
🧠Researchers have developed ESAA-Security, a new architecture for conducting secure, verifiable audits of AI-generated code using structured agent workflows rather than unstructured LLM conversations. The system creates an immutable audit trail through event-sourcing and produces comprehensive security reports across 26 tasks and 95 executable checks.
AIBullisharXiv – CS AI · Mar 36/1012
🧠Researchers developed Self-Healing Router, a fault-tolerant system for LLM agents that reduces control-plane LLM calls by 93% while maintaining correctness. The system uses graph-based routing with automatic recovery mechanisms, treating agent decisions as routing problems rather than reasoning tasks.
$COMP
AINeutralarXiv – CS AI · Apr 74/10
🧠Researchers developed a minimal AI architecture where a 'perspective latent' creates history-dependent perception in artificial agents. The system allows identical observations to be processed differently based on accumulated experience, demonstrating measurable plasticity that persists even after conditions return to normal.