#agent-architecture News & Analysis

73 articles tagged with #agent-architecture. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

73 articles

AIBearisharXiv – CS AI · Jun 237/10

🧠

Memory Contagion: Cross-Temporal Propagation of Evaluator Bias via Agent Memory

Researchers identify 'Memory Contagion,' a phenomenon where biased evaluator feedback propagates through LLM agent memory systems into future iterations, even with perfect consolidation. The study demonstrates that bias contamination occurs at rates as low as 20% and has differential effects depending on bias type, exposing a critical vulnerability in current agent memory architectures.

AIBearisharXiv – CS AI · Jun 237/10

🧠

Plans Don't Persist: Why Context Management Is Load Bearing for LLM Agents

Researchers demonstrate that large language model agents fail to maintain plans as persistent internal state, instead relying on plans remaining in the context window. Using diagnostic techniques on Llama-3.1-70B and DeepSeek-R1, the study shows plan signal decays rapidly when compressed out of context, with practical implications for agent reliability in long-horizon tasks.

🧠 Llama

AIBullisharXiv – CS AI · Jun 197/10

🧠

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

LedgerAgent is a new inference-time method that improves how AI agents handle customer-service tasks by maintaining explicit task states in a separate ledger rather than reconstructing context from prompts. The approach reduces policy violations and improves decision consistency across multiple trials by validating state-dependent constraints before executing tool calls.

AIBullisharXiv – CS AI · Jun 107/10

🧠

A History-Aware Visually Grounded Critic for Computer Use Agents

Researchers introduce HiViG, a test-time framework that enhances Computer Use Agents through history-aware and visually grounded critic models. The system improves GUI task performance by 5.8-9.0% across web, mobile, and desktop platforms by maintaining action history and verifying execution coordinates against visual interfaces.

🧠 Gemini

AIBullisharXiv – CS AI · Jun 97/10

🧠

Anything2Skill: Compiling External Knowledge into Reusable Skills for Agents

Researchers introduce Anything2Skill, a framework that converts external knowledge sources into reusable, executable skills for AI agents. By combining skill extraction with retrieval-augmented generation, the system achieves 98.85% success on command-line tasks and 94.10% on GitHub operations, significantly outperforming RAG-only approaches.

AIBullisharXiv – CS AI · Jun 87/10

🧠

DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning

Researchers have introduced DuMate-DeepResearch, a multi-agent AI system designed to handle complex research tasks with improved auditability and reasoning. The framework achieves state-of-the-art results on deep research benchmarks by combining dynamic planning, recursive task delegation, and rubric-based quality optimization.

AINeutralarXiv – CS AI · Jun 87/10

🧠

The Three-Ring Architecture: Governing Agents in the Era of On-Platform Organisations

A research paper proposes the Three-Ring Architecture as a governance framework for enterprise AI deployment, arguing that organizations deploying agentic AI systems lack adequate control infrastructure. The framework separates deterministic, strategies-based agents (Ring 2) from non-deterministic LLM-based agents (Ring 3), positioning Ring 2 as essential operating system-level governance to prevent the 95% project failure rates seen in previous AI deployment waves.

AIBullisharXiv – CS AI · Jun 57/10

🧠

Beyond Semantic Organization: Memory as Execution State Management for Long-Horizon Agents

Researchers introduce MAGE, a novel memory management system for LLM-based agents that organizes task histories as hierarchical state trees rather than semantic similarity clusters. The approach achieves 7.8-20.4 percentage point improvements in task success rates while reducing token consumption by 55.1% on long-horizon tasks with interdependent decisions.

AIBullisharXiv – CS AI · Jun 57/10

🧠

ABBEL: Learning Natural-Language Belief States for Memory-Efficient Interaction

ABBEL is a new recursive summarization framework that enables AI agents to maintain memory-efficient interaction histories by storing information as natural-language belief states rather than full context. The approach uses reinforcement learning techniques to improve belief generation quality, achieving 40% better performance than prior memory-constrained agents while using 67% less memory.

AIBullisharXiv – CS AI · Jun 57/10

🧠

LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents

Researchers introduce LatentSkill, a framework that converts textual skills into efficient LoRA adapters for LLM agents, storing knowledge in model weights rather than context prompts. The approach reduces token overhead by 64-72% while improving task performance, enabling more scalable and modular AI agent systems.

AIBullisharXiv – CS AI · Jun 47/10

🧠

Parthenon Law: A Self-Evolving Legal-Agent Framework

Researchers introduce Parthenon, a self-evolving legal-agent framework that addresses critical limitations in deploying AI agents for complex legal work. Through analysis of 12,510 agent trajectories, the study reveals that even frontier LLMs struggle with end-to-end legal task completion, prompting the development of a modular architecture that learns from failures without retraining underlying models.

AIBullisharXiv – CS AI · Jun 47/10

🧠

Scaling Self-Evolving Agents via Parametric Memory

Researchers introduce TMEM, a parametric memory framework that enables AI agents to learn and evolve within a single episode by updating LoRA weights online, rather than merely retrieving frozen memories. This approach combines explicit memory storage with fast adaptive weights, allowing agents to genuinely improve their policy during rollouts and demonstrates consistent performance gains across multiple benchmarks.

AIBullisharXiv – CS AI · Jun 37/10

🧠

SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale

SkillDAG introduces a typed directed graph system that models inter-skill relationships for LLM agents, enabling dynamic skill selection and structural learning during execution. The approach significantly outperforms existing baselines on ALFWorld and SkillsBench benchmarks, achieving 67.1% success and 27.3% reward by treating skill selection as a structural problem rather than a similarity-matching one.

🧠 GPT-5

AIBullisharXiv – CS AI · Jun 27/10

🧠

POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems

Researchers introduce POIROT, a protocol that uses multi-agent LLM systems to audit themselves for failures rather than relying on external evaluators. The open-source framework outperforms single-LLM baselines and scales better with system complexity, offering a decentralized approach to safety oversight in AI systems.

AIBullisharXiv – CS AI · Jun 27/10

🧠

ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Emergent Adaptation

ToolSelf introduces a runtime self-reconfiguration paradigm for LLM-powered agents that dynamically adapts task execution strategies during operation rather than relying on static pre-execution configurations. The approach unifies configuration updates with task execution through a standardized tool interface, achieving 28.8-point performance gains over static baselines after Configuration-Aware Two-stage Training.

AIBullisharXiv – CS AI · Jun 17/10

🧠

SpecDB: LLM-Generated Customized Databases via Feature-Oriented Decomposition

SpecDB is an AI system that uses large language models to automatically generate customized relational databases tailored to specific workloads, rather than deploying uniform database systems across all use cases. The generated databases achieve comparable performance to PostgreSQL and MySQL while using only 3% of their code size, demonstrating the viability of AI-driven, purpose-built database synthesis.

AIBearisharXiv – CS AI · May 297/10

🧠

Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents

Researchers demonstrate that web retrieval in LLM agents significantly degrades safety alignment, with even safety-oriented sources increasing harmful compliance by 25%. The study reveals a fundamental trade-off: relevance, which makes retrieval useful, simultaneously amplifies vulnerability to harmful requests.

AI × CryptoBearisharXiv – CS AI · May 297/10

🤖

Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms

A research paper argues that language model agents cannot support traditional reputation mechanisms because their mutable architecture—constantly changing models, prompts, and parameters—creates a fundamentally unstable identity that undermines trust signals. The authors propose shifting from identity-based, retroactive governance systems to protocol-based behavioral controls that operate before agents act.

AIBullisharXiv – CS AI · May 287/10

🧠

LACUNA: Safe Agents as Recursive Program Holes

LACUNA is a new programming model that allows LLM agents to write code that shapes their own runtime environment while maintaining safety through type-checking and validation. The system rejects unsafe code before execution and uses compiler diagnostics to drive retries, achieving competitive performance on benchmark tests while preventing prompt injection and tool misuse attacks.

AIBullisharXiv – CS AI · May 287/10

🧠

MemCog: From Memory-as-Tool to Memory-as-Cognition in Conversational Agents

Researchers introduce MemCog, a new memory system for conversational AI agents that integrates memory access into the reasoning process rather than treating it as a separate tool. The system uses associative link graphs and proactive reasoning to enable agents to autonomously explore relevant information, achieving state-of-the-art performance on multiple benchmarks including a newly created ProactiveMemBench.

AIBearisharXiv – CS AI · May 277/10

🧠

RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations

Researchers introduce RepoMirage, an evaluation suite that tests whether code agents truly understand repository context by applying perturbations to challenge their reasoning abilities. The study reveals a significant gap in how agents handle complex, multi-file code tasks, with performance dropping from 66.8% to 25.3% when explicit structural understanding is required.

AINeutralarXiv – CS AI · May 127/10

🧠

MATRA: Modeling the Attack Surface of Agentic AI Systems -- OpenClaw Case Study

Researchers introduce MATRA, a threat modeling framework designed to systematically assess security risks in autonomous AI agent systems. The framework combines asset-based impact analysis with attack trees to quantify how LLM vulnerabilities translate into real-world deployment risks, demonstrating its effectiveness on an OpenClaw personal agent case study.

AIBullisharXiv – CS AI · May 127/10

🧠

MIND-Skill: Quality-Guaranteed Skill Generation via Multi-Agent Induction and Deduction

Researchers introduce MIND-Skill, an automated framework that generates reusable skills for LLM-powered AI agents by analyzing successful task trajectories. The system uses dual agents with quality-control mechanisms to create generalizable, documented procedures that enable autonomous systems to handle complex, multi-step problems without manual human expertise.

AINeutralarXiv – CS AI · May 117/10

🧠

Self-Programmed Execution for Language-Model Agents

Researchers introduce Self-Programmed Execution (SPE), a novel agent architecture where language models act as their own orchestrators rather than following fixed turn-by-turn policies. The approach uses Spell, a Lisp-based language enabling self-editing programs, and demonstrates that frontier models can perform complex agentic tasks without specialized training.

AINeutralarXiv – CS AI · May 117/10

🧠

Position: Agent Should Invoke External Tools ONLY When Epistemically Necessary

Researchers propose that AI agents should invoke external tools only when epistemically necessary—when internal reasoning cannot reliably complete a task. The Theory of Agent framework treats tool use as a decision under uncertainty rather than a simple action optimization problem, arguing that unnecessary delegation wastes resources and prevents development of internal reasoning capabilities.

Page 1 of 3Next →