#multi-agent-llm News & Analysis

16 articles tagged with #multi-agent-llm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

16 articles

AIBearisharXiv – CS AI · Jun 47/10

🧠

Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs

Researchers introduce MAMA, a framework measuring how network topology affects private information leakage in multi-agent LLM systems. The study demonstrates that denser connectivity and shorter distances between attackers and targets significantly increase memory leakage, with practical implications for securing distributed AI systems.

AIBearisharXiv – CS AI · Jun 17/10

🧠

Multi-Agent Teams Hold Experts Back

A new research paper reveals that self-organizing multi-agent LLM teams significantly underperform compared to their best individual expert members, with performance losses reaching 41.1% on ML benchmarks. The primary failure mechanism is not identifying experts but rather failing to leverage them appropriately, as teams tend toward consensus-averaging rather than expertise-weighted decision-making.

AIBearisharXiv – CS AI · May 287/10

🧠

HARP: Measuring Harm Amplification in Multi-Agent LLM Systems

Researchers introduce HARP, a methodology for measuring how harm propagates across multi-agent LLM systems when one component is compromised. Testing on a finance-oriented seven-agent system reveals that single-agent compromise creates the strongest amplification effects, while existing defenses struggle to balance security with utility costs.

AINeutralarXiv – CS AI · Jun 256/10

🧠

Agentic Knowledge Tracing: A Multi-Agent LLM Architecture for Stealth Assessment of Financial Literacy in Serious Games

Researchers developed Agentic BKT, a multi-agent LLM system that assesses financial literacy in educational games without disrupting gameplay. The architecture uses specialized AI agents to evaluate player decisions across four financial competency domains, demonstrating significantly higher predictive validity than single-LLM approaches when validated against 193 K-12 participants.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Hallucination as Context Drift: Synchronization Protocols for Multi-Agent LLM Systems

Researchers propose that hallucinations in multi-agent LLM systems stem from context drift—misaligned knowledge states between concurrent agents—rather than model deficiencies alone. They introduce the Context Divergence Score and Shared State Verification Protocol to synchronize agent states efficiently, achieving 34% fewer hallucinations than naive broadcast methods while using 58% fewer API calls.

🧠 Claude

AINeutralarXiv – CS AI · Jun 96/10

🧠

"So There's a Catch-22 Here": How Early Adopters Who Build Multi-Agent LLM Systems Conceptualize Transparency

Researchers conducted interviews with 13 early adopters building multi-agent LLM systems at a major technology organization to understand how they conceptualize and practice transparency. The study identifies five key transparency frameworks—reproducibility, debugging, boundary-setting, visualization, and auditing—revealing that transparency in distributed AI architectures is understood as a situated socio-technical practice rather than a single standardized concept.

AINeutralarXiv – CS AI · Jun 56/10

🧠

DPBench: Structural Determinants of Multi-Agent LLM Coordination Under Simultaneous Resource Contention

DPBench introduces a benchmark for testing multi-agent LLM coordination using the Dining Philosophers problem, revealing that deadlock rates vary dramatically (25%-90%) across models under identical conditions. The research demonstrates that coordination success is primarily determined by protocol design—including communication structure and concurrency primitives—rather than model capability alone.

🧠 GPT-5🧠 Claude🧠 Opus

AINeutralarXiv – CS AI · May 286/10

🧠

TRACER: Turn-level Regret Matching with Inner Reinforcement Credit for Cooperative Multi-LLM Reasoning

Researchers introduce TRACER, a reinforcement learning framework that enables multiple large language models to collaborate effectively on reasoning tasks by learning when to speak and what to say through turn-level decision-making. The approach addresses key challenges in multi-agent AI systems including sparse rewards, computational inefficiency, and oscillating performance, demonstrating improvements across mathematical reasoning benchmarks.

AINeutralarXiv – CS AI · May 276/10

🧠

Helicase: Uncertainty-Guided Supply Chain Knowledge Graph Construction with Autonomous Multi-Agent LLMs

Researchers introduce Helicase, an autonomous multi-agent LLM system designed to construct supply chain knowledge graphs by synthesizing fragmented web data through multi-hop reasoning. The system incorporates uncertainty quantification across three layers to enable calibrated confidence assessment, addressing a critical gap in complex supply chain intelligence tasks that cannot be solved by single-document queries.

AINeutralarXiv – CS AI · May 276/10

🧠

DIANOIA: Diagnostic Decomposition and Joint Optimization for Multi-Agent Reasoning

Researchers introduce DIANOIA, a diagnostic framework for multi-agent LLM systems that decomposes reasoning performance into three measurable channels: coverage, fidelity, and synthesis. The method enables practitioners to identify performance bottlenecks and allocate computational resources more efficiently, achieving significant improvements on multiple benchmarks.

🧠 Claude

AINeutralarXiv – CS AI · May 126/10

🧠

Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs

Researchers propose a critique-and-routing controller for multi-agent LLM systems that iteratively refines outputs through sequential decision-making rather than one-shot routing. The method uses reinforcement learning with agent-utilization constraints to achieve performance approaching the strongest agent while reducing computational calls by over 75%, advancing coordination efficiency in heterogeneous AI systems.

AINeutralarXiv – CS AI · May 126/10

🧠

RADAR: Redundancy-Aware Diffusion for Multi-Agent Communication Structure Generation

Researchers introduce RADAR, a framework that optimizes multi-agent LLM communication structures through adaptive diffusion models, reducing token consumption while improving task accuracy. The approach moves beyond fixed communication topologies to enable dynamic, task-specific agent coordination across diverse computational problems.

AINeutralarXiv – CS AI · May 126/10

🧠

CalBench: Evaluating Coordination-Privacy Trade-offs in Multi-Agent LLMs

Researchers introduce CalBench, a controlled evaluation framework for testing multi-agent LLM coordination in calendar scheduling scenarios where agents must negotiate shared commitments while protecting private information. The benchmark measures coordination quality, communication efficiency, fairness, and privacy leakage in decentralized systems where no single agent has complete information.

🏢 Meta

AIBullisharXiv – CS AI · Apr 76/10

🧠

Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus

Research reveals that multi-agent LLM committees suffer from 'representational collapse' where agents produce highly similar outputs despite different role prompts, with mean cosine similarity of 0.888. A new diversity-aware consensus protocol (DALC) improves accuracy to 87% while reducing token costs by 26% compared to traditional self-consistency methods.

AIBullisharXiv – CS AI · Mar 66/10

🧠

Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?

Research shows that multi-agent LLM systems using models from different vendors (o4-mini, Gemini-2.5-Pro, Claude-4.5-Sonnet) significantly outperform single-vendor teams in clinical diagnosis tasks. Mixed-vendor configurations achieve superior recall and accuracy by combining complementary strengths and reducing shared biases that affect homogeneous model teams.

🧠 Claude🧠 Gemini

AINeutralarXiv – CS AI · Mar 36/107

🧠

Graph-theoretic Agreement Framework for Multi-agent LLM Systems

Researchers propose a graph-theoretic framework for securing multi-agent LLM systems by analyzing consensus in signed, directed interaction networks. The study addresses vulnerabilities in distributed AI architectures where hidden system prompts can act as 'topological Trojan horses' that destabilize cooperative consensus among AI agents.