#agent-architecture News & Analysis

73 articles tagged with #agent-architecture. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

73 articles

AIBullisharXiv – CS AI · May 97/10

🧠

Belief Memory: Agent Memory Under Partial Observability

Researchers introduce BeliefMem, a novel memory architecture for LLM agents that retains multiple candidate conclusions with associated probabilities instead of committing to single deterministic interpretations. This probabilistic approach preserves uncertainty, allows agents to update confidence as new evidence arrives, and demonstrates superior performance on LoCoMo and ALFWorld benchmarks compared to existing memory methods.

AIBearisharXiv – CS AI · May 17/10

🧠

Contextual Agentic Memory is a Memo, Not True Memory

Researchers argue that current AI agent memory systems (vector stores, RAG, scratchpads) perform lookup operations rather than true memory consolidation, causing agents to accumulate indefinite notes without developing expertise, hit a generalization ceiling on novel tasks, and remain vulnerable to persistent memory poisoning attacks. The paper draws on neuroscience's Complementary Learning Systems theory to show biological intelligence pairs fast exemplar storage with slow weight consolidation—a dual mechanism current AI systems lack.

AIBullisharXiv – CS AI · May 17/10

🧠

Building Persona-Based Agents On Demand: Tailoring Multi-Agent Workflows to User Needs

Researchers propose a pipeline for dynamically generating persona-based AI agents at runtime, moving beyond fixed agent architectures to enable personalized multi-agent workflows. This approach allows agentic platforms to adapt agent roles, coordination patterns, and interaction flows to match individual user characteristics and contextual demands, opening new design paradigms for more flexible AI systems.

AIBullisharXiv – CS AI · Apr 157/10

🧠

Drawing on Memory: Dual-Trace Encoding Improves Cross-Session Recall in LLM Agents

Researchers introduce dual-trace memory encoding for LLM agents, pairing factual records with narrative scene reconstructions to improve cross-session recall by 20+ percentage points. The method significantly enhances temporal reasoning and multi-session knowledge aggregation without increasing computational costs, advancing the capability of persistent AI agent systems.

AINeutralarXiv – CS AI · Apr 147/10

🧠

PAC-BENCH: Evaluating Multi-Agent Collaboration under Privacy Constraints

Researchers introduce PAC-Bench, a benchmark for evaluating how AI agents collaborate while maintaining privacy constraints. The study reveals that privacy protections significantly degrade multi-agent system performance and identify coordination failures as a critical unsolved challenge requiring new technical approaches.

$PAC

AIBullisharXiv – CS AI · Mar 267/10

🧠

From Imperative to Declarative: Towards LLM-friendly OS Interfaces for Boosted Computer-Use Agents

Researchers have developed Declarative Model Interface (DMI), a new abstraction layer that transforms traditional GUIs into LLM-friendly interfaces for computer-use agents. Testing with Microsoft Office Suite showed 67% improvement in task success rates and 43.5% reduction in interaction steps, with over 61% of tasks completed in a single LLM call.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Agent Privilege Separation in OpenClaw: A Structural Defense Against Prompt Injection

Researchers developed a two-agent defense system called OpenClaw that achieved 0% attack success rate against prompt injection attacks on LLM applications. The system uses agent isolation and JSON formatting to structurally prevent malicious prompts from reaching action-taking agents.

AIBullisharXiv – CS AI · Mar 167/10

🧠

Towards AI Search Paradigm

Researchers introduce the AI Search Paradigm, a comprehensive framework for next-generation search systems using four LLM-powered agents (Master, Planner, Executor, Writer) that collaborate to handle everything from simple queries to complex reasoning tasks. The system employs modular architecture with dynamic workflows for task planning, tool integration, and content synthesis to create more adaptive and scalable AI search capabilities.

AINeutralarXiv – CS AI · Mar 46/103

🧠

What Capable Agents Must Know: Selection Theorems for Robust Decision-Making under Uncertainty

Researchers prove 'selection theorems' showing that AI agents achieving low regret on prediction tasks must develop internal predictive models and belief states. The work demonstrates that structured internal representations are mathematically necessary, not just helpful, for competent decision-making under uncertainty.

AINeutralarXiv – CS AI · Jun 256/10

🧠

Agentic System as Compressor: Quantifying System Intelligence in Bits

Researchers propose measuring agentic AI system intelligence through information compression, demonstrating that components like tools, retrieval, and verification reduce the bits needed to reconstruct outputs across five task domains. This analytical framework provides a quantitative method for evaluating multi-turn AI agents beyond traditional performance metrics.

AINeutralarXiv – CS AI · Jun 236/10

🧠

CalVerT: Augmenting Agents with Calibrated Verifier Telemetry Improves Action and Learning in Knowledge-Intensive Tasks

CalVerT is a new framework that enhances LLM agents by providing calibrated confidence scores and grounding verification, helping agents distinguish between reliable and uncertain knowledge during question-answering tasks. The approach reduces both inaccurate confident answers and wasteful over-retrieval, improving performance across multiple QA benchmarks without requiring additional training.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Hypothesis-Driven Skill Optimization for LLM Agents

Researchers propose Hypothesis-Driven Skill Optimization (HDSO), a framework that improves LLM agent performance by validating and managing external skills through controlled experimentation rather than direct model weight updates. The method demonstrates 4-7 point improvements on ALFWorld benchmarks while maintaining robustness against noisy training data, suggesting a safer approach to agent skill enhancement.

AINeutralarXiv – CS AI · Jun 116/10

🧠

SkillJuror: Measuring How Agent Skill Organization Changes Runtime Behavior

Researchers introduce SkillJuror, a framework measuring how LLM agent skill organization affects runtime behavior independent of content. Testing Progressive Disclosure—a hierarchical skill structure—against flat baselines shows agents access 3.26x more resources and achieve 4.1% higher verification rates, revealing that procedural knowledge presentation meaningfully influences agent reasoning patterns.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Knowing When to Ask: Self-Gated Clarification for Hierarchical Language Agents

Researchers propose ACTION-RATING, a framework enabling hierarchical AI agents to recognize uncertainty and request clarification as a direct action competing with navigation decisions. Testing on a 30,000-node taxonomy shows information-seeking effectiveness rising from 50% to 74% as agents shift from mandatory to opportunistic clarification modes, with accuracy gains up to 16.2%.

AINeutralarXiv – CS AI · Jun 96/10

🧠

RunAgent SuperBrowser: A Theory of Autonomous Web Navigation Grounded in Human Browsing Behaviour

RunAgent has developed SuperBrowser, an autonomous web navigation agent that mimics human browsing behavior through selective perception and structured memory management. The system achieves 89.47% success on the Mind2Web Hard benchmark, outperforming all published open-source baselines by applying consistent cognitive principles throughout its architecture.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Multi-Turn Evaluation of Deep Research Agents Under Process-Level Feedback

Researchers evaluate whether deep research agents (DRAs) can improve iteratively through feedback, finding that self-reflection yields negligible gains while single rounds of process-level feedback produce substantial improvements—but these gains don't compound over multiple turns due to regression on previously satisfied criteria.

AINeutralarXiv – CS AI · Jun 96/10

🧠

A Survey on Large Language Model-Based Game Agents

A comprehensive survey examines Large Language Model-based game agents (LLMGAs) as testbeds for artificial general intelligence capabilities. The research synthesizes LLM game agent design through a unified architecture covering memory, reasoning, and perception-action interfaces at single-agent levels, plus communication protocols and organizational models for multi-agent coordination across six major game genres.

AINeutralarXiv – CS AI · Jun 86/10

🧠

AdMem: Advanced Memory for Task-solving Agents

Researchers introduce AdMem, a unified memory framework that enables large language model agents to effectively store, organize, and retrieve semantic, episodic, and procedural knowledge across long-horizon tasks. The system uses a multi-agent architecture with reward-based evaluation to automatically generate and manage memories, demonstrating improved robustness compared to existing approaches.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Do More Agents Help? Controlled and Protocol-Aligned Evaluation of LLM Agent Workflows

Researchers introduce BenchAgent, an evaluation framework comparing single-agent and multi-agent LLM workflows under standardized conditions across ten benchmarks. Results show that adding more agents does not consistently improve performance, with only one of six tested multi-agent systems exceeding single-agent baselines, while most incur higher computational costs for lower accuracy.

🧠 GPT-4🧠 Claude

AINeutralarXiv – CS AI · Jun 56/10

🧠

Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

Researchers present the first comprehensive systems characterization of LLM agent memory architectures, introducing a taxonomy and profiling framework to analyze how different design choices impact performance across write and read paths. The study benchmarks ten representative systems and derives actionable recommendations for optimizing agent memory at scale.

AINeutralarXiv – CS AI · Jun 36/10

🧠

Think-Before-Speak: From Internal Evaluation to Public Expression in Multi-Agent Social Simulation

Researchers introduce TBS (Think-Before-Speak), a multi-agent simulation framework that separates LLM agents' internal reasoning from public dialogue in social interactions. The framework tracks internal states like cognitive dissonance and speaking willingness, then orchestrates public utterances, enabling detailed analysis of how private evaluation drives public expression in collective deliberation scenarios.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Can LLM Agents Sustain Long-Horizon Organizational Dynamics?

Researchers introduce TaskWeave, a hierarchical framework that enables large language model agents to maintain coherent behavior in complex organizational simulations over extended periods. The system uses memory-centered coordination and dependency-aware tracking to sustain long-horizon tasks, demonstrating viability for enterprise-level multi-agent applications through year-long IT company simulations.

AIBullisharXiv – CS AI · Jun 26/10

🧠

SkillSmith: Co-Evolving Skills and Tools for Self-Improving Agent Systems

SkillSmith introduces a co-evolution framework where AI agent skills and tools develop together rather than independently, using ecological dynamics to model skill interactions and anti-pattern tracking to prevent repeated failures. The system demonstrates consistent improvements across multiple benchmarks and model scales, particularly as task complexity increases.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Attested Tool-Server Admission: A Security Extension to the Model Context Protocol

Researchers have developed mcp-attested, a security extension to the Model Context Protocol that enables safe integration of third-party tool servers with LLM agents through cryptographic attestation, allowlists, and audit logging. The mechanism addresses critical trust gaps in how AI agents interact with external services without modifying existing protocols, establishing a framework that could become an MCP standard.

AINeutralarXiv – CS AI · May 296/10

🧠

Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?

Researchers propose replacing LLM-based triggers in proactive agent systems with a lightweight temporal graph learning (TGL) model that processes structured event streams directly. The approach achieves 16.7% mean F1 improvement while running 4-7x faster on GPUs and 12-83x faster on consumer hardware, with a 220 MiB footprint suitable for on-device deployment.

← PrevPage 2 of 3Next →