AIBearisharXiv – CS AI · 16h ago7/10
🧠A new research paper reveals that self-organizing multi-agent LLM teams significantly underperform compared to their best individual expert members, with performance losses reaching 41.1% on ML benchmarks. The primary failure mechanism is not identifying experts but rather failing to leverage them appropriately, as teams tend toward consensus-averaging rather than expertise-weighted decision-making.
AIBearisharXiv – CS AI · 4d ago7/10
🧠Researchers introduce HARP, a methodology for measuring how harm propagates across multi-agent LLM systems when one component is compromised. Testing on a finance-oriented seven-agent system reveals that single-agent compromise creates the strongest amplification effects, while existing defenses struggle to balance security with utility costs.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce TRACER, a reinforcement learning framework that enables multiple large language models to collaborate effectively on reasoning tasks by learning when to speak and what to say through turn-level decision-making. The approach addresses key challenges in multi-agent AI systems including sparse rewards, computational inefficiency, and oscillating performance, demonstrating improvements across mathematical reasoning benchmarks.
AINeutralarXiv – CS AI · 5d ago6/10
🧠Researchers introduce Helicase, an autonomous multi-agent LLM system designed to construct supply chain knowledge graphs by synthesizing fragmented web data through multi-hop reasoning. The system incorporates uncertainty quantification across three layers to enable calibrated confidence assessment, addressing a critical gap in complex supply chain intelligence tasks that cannot be solved by single-document queries.
AINeutralarXiv – CS AI · 5d ago6/10
🧠Researchers introduce DIANOIA, a diagnostic framework for multi-agent LLM systems that decomposes reasoning performance into three measurable channels: coverage, fidelity, and synthesis. The method enables practitioners to identify performance bottlenecks and allocate computational resources more efficiently, achieving significant improvements on multiple benchmarks.
🧠 Claude
AINeutralarXiv – CS AI · May 126/10
🧠Researchers propose a critique-and-routing controller for multi-agent LLM systems that iteratively refines outputs through sequential decision-making rather than one-shot routing. The method uses reinforcement learning with agent-utilization constraints to achieve performance approaching the strongest agent while reducing computational calls by over 75%, advancing coordination efficiency in heterogeneous AI systems.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce RADAR, a framework that optimizes multi-agent LLM communication structures through adaptive diffusion models, reducing token consumption while improving task accuracy. The approach moves beyond fixed communication topologies to enable dynamic, task-specific agent coordination across diverse computational problems.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce CalBench, a controlled evaluation framework for testing multi-agent LLM coordination in calendar scheduling scenarios where agents must negotiate shared commitments while protecting private information. The benchmark measures coordination quality, communication efficiency, fairness, and privacy leakage in decentralized systems where no single agent has complete information.
🏢 Meta
AIBullisharXiv – CS AI · Apr 76/10
🧠Research reveals that multi-agent LLM committees suffer from 'representational collapse' where agents produce highly similar outputs despite different role prompts, with mean cosine similarity of 0.888. A new diversity-aware consensus protocol (DALC) improves accuracy to 87% while reducing token costs by 26% compared to traditional self-consistency methods.
AIBullisharXiv – CS AI · Mar 66/10
🧠Research shows that multi-agent LLM systems using models from different vendors (o4-mini, Gemini-2.5-Pro, Claude-4.5-Sonnet) significantly outperform single-vendor teams in clinical diagnosis tasks. Mixed-vendor configurations achieve superior recall and accuracy by combining complementary strengths and reducing shared biases that affect homogeneous model teams.
🧠 Claude🧠 Gemini
AINeutralarXiv – CS AI · Mar 36/107
🧠Researchers propose a graph-theoretic framework for securing multi-agent LLM systems by analyzing consensus in signed, directed interaction networks. The study addresses vulnerabilities in distributed AI architectures where hidden system prompts can act as 'topological Trojan horses' that destabilize cooperative consensus among AI agents.