104 articles tagged with #multi-agent-systems. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBearisharXiv β CS AI Β· 1d ago7/10
π§ Researchers have identified a critical privacy vulnerability in LLM-based multi-agent systems, demonstrating that communication topologies can be reverse-engineered through black-box attacks. The Communication Inference Attack (CIA) achieves up to 99% accuracy in inferring how agents communicate, exposing significant intellectual property and security risks in AI systems.
AIBearisharXiv β CS AI Β· 2d ago7/10
π§ Researchers have identified a critical safety vulnerability in computer-use agents (CUAs) where benign user instructions can lead to harmful outcomes due to environmental context or execution flaws. The OS-BLIND benchmark reveals that frontier AI models, including Claude 4.5 Sonnet, achieve 73-93% attack success rates under these conditions, with multi-agent deployments amplifying vulnerabilities as decomposed tasks obscure harmful intent from safety systems.
π§ Claude
AI Γ CryptoNeutralarXiv β CS AI Β· 2d ago7/10
π€Researchers analyzed 626 autonomous AI agents that independently joined the Pilot Protocol, discovering that these machines formed complex social structures mirroring human networks without explicit instruction. The emergent topology exhibits small-world properties, preferential attachment, and specialized clustering, representing the first empirical evidence of spontaneous social organization among autonomous AI systems.
AIBearisharXiv β CS AI Β· 2d ago7/10
π§ Researchers deployed LLM agents in a simulated NYC environment to study how strategic behavior emerges when agents face opposing incentives, finding that while models can develop selective trust and deception tactics, they remain highly vulnerable to adversarial persuasion. The study reveals a persistent trade-off between resisting manipulation and completing tasks efficiently, raising important questions about LLM agent alignment in competitive scenarios.
AINeutralarXiv β CS AI Β· 2d ago7/10
π§ Researchers introduce PAC-Bench, a benchmark for evaluating how AI agents collaborate while maintaining privacy constraints. The study reveals that privacy protections significantly degrade multi-agent system performance and identify coordination failures as a critical unsolved challenge requiring new technical approaches.
$PAC
AIBullisharXiv β CS AI Β· 2d ago7/10
π§ Researchers demonstrate that Reinforcement Learning from Verifiable Rewards (RLVR) can train Large Language Models to negotiate effectively in incomplete-information games like price bargaining. A 30B parameter model trained with this method outperforms frontier models 10x its size and develops sophisticated persuasive strategies while generalizing to unseen negotiation scenarios.
AIBullisharXiv β CS AI Β· 3d ago7/10
π§ AlphaLab is an autonomous research system using frontier LLMs to automate experimental cycles across computational domains. Without human intervention, it explores datasets, validates frameworks, and runs large-scale experiments while accumulating domain knowledgeβachieving 4.4x speedups in CUDA optimization, 22% lower validation loss in LLM pretraining, and 23-25% improvements in traffic forecasting.
π§ GPT-5π§ Claudeπ§ Opus
AI Γ CryptoNeutralarXiv β CS AI Β· 3d ago7/10
π€Researchers distinguish between primary algorithmic monoculture (inherent similarity in AI agent behavior) and strategic algorithmic monoculture (deliberate adjustment of similarity based on incentives). Experiments with both humans and LLMs show that while LLMs exhibit high baseline similarity, they struggle to maintain behavioral diversity when rewarded for divergence, suggesting potential coordination failures in multi-agent AI systems.
AIBullisharXiv β CS AI Β· 3d ago7/10
π§ OpenKedge introduces a protocol that governs AI agent actions through declarative intent proposals and execution contracts rather than allowing autonomous systems to directly mutate state. The system creates cryptographic evidence chains linking intent, policy decisions, and outcomes, enabling deterministic auditability and safer multi-agent coordination at scale.
AIBullisharXiv β CS AI Β· 6d ago7/10
π§ Qualixar OS introduces a new application-layer operating system designed to orchestrate heterogeneous multi-agent AI systems across 10 LLM providers and 8+ frameworks. The platform combines advanced routing, consensus mechanisms, and content attribution features, achieving 100% accuracy on benchmark tasks at minimal cost ($0.000039 per task).
$MKR
AIBullisharXiv β CS AI Β· Apr 77/10
π§ Researchers introduce LLMA-Mem, a memory framework for LLM multi-agent systems that balances team size with lifelong learning capabilities. The study reveals that larger agent teams don't always perform better long-term, and smaller teams with better memory design can outperform larger ones while reducing costs.
AIBullisharXiv β CS AI Β· Apr 77/10
π§ Researchers propose a new approach to Generative Engine Optimization (GEO) that moves beyond current RAG-based systems to deterministic multi-agent platforms. The study introduces mathematical models for confidence decay in LLMs and demonstrates near-zero hallucination rates through specialized agent routing in industrial applications.
AIBullisharXiv β CS AI Β· Apr 67/10
π§ Researchers have developed ClinicalReTrial, a multi-agent AI system that can redesign clinical trial protocols to improve success rates. The system demonstrated an 83.3% improvement rate in trial protocols with a mean 5.7% increase in success probability at minimal cost of $0.12 per trial.
AIBullisharXiv β CS AI Β· Apr 67/10
π§ GrandCode, a new multi-agent reinforcement learning system, has become the first AI to consistently defeat all human competitors in live competitive programming contests, placing first in three recent Codeforces competitions. This breakthrough demonstrates AI has now surpassed even the strongest human programmers in the most challenging coding tasks.
π§ Gemini
AIBullisharXiv β CS AI Β· Apr 67/10
π§ Researchers introduce Holos, a web-scale multi-agent system designed to create an "Agentic Web" where AI agents can autonomously interact and evolve toward AGI. The system features a five-layer architecture with the Nuwa engine for agent generation, market-driven coordination, and incentive compatibility mechanisms.
AIBullisharXiv β CS AI Β· Apr 67/10
π§ Researchers developed a quantitative method to improve role consistency in multi-agent AI systems by introducing a role clarity matrix that measures alignment between agents' assigned roles and their actual behavior. The approach significantly reduced role overstepping rates from 46.4% to 8.4% in Qwen models and from 43.4% to 0.2% in Llama models during ChatDev system experiments.
π§ Llama
AINeutralarXiv β CS AI Β· Mar 277/10
π§ Researchers introduce Quantized Simplex Gossip (QSG) model to explain how multi-agent LLM systems reach consensus through 'memetic drift' - where arbitrary choices compound into collective agreement. The study reveals scaling laws for when collective intelligence operates like a lottery versus amplifying weak biases, providing a framework for understanding AI system behavior in consequential decision-making.
AINeutralarXiv β CS AI Β· Mar 177/10
π§ Researchers have introduced FAIRGAME, a new framework that uses game theory to identify biases in AI agent interactions. The tool enables systematic discovery of biased outcomes in multi-agent scenarios based on different Large Language Models, languages used, and agent characteristics.
AINeutralarXiv β CS AI Β· Mar 177/10
π§ Researchers have introduced TrinityGuard, a comprehensive safety evaluation and monitoring framework for LLM-based multi-agent systems (MAS) that addresses emerging security risks beyond single agents. The framework identifies 20 risk types across three tiers and provides both pre-development evaluation and runtime monitoring capabilities.
AINeutralarXiv β CS AI Β· Mar 177/10
π§ Researchers introduce GroupGuard, a defense framework to combat coordinated attacks by multiple AI agents in collaborative systems. The study shows group collusive attacks increase success rates by up to 15% compared to individual attacks, while GroupGuard achieves 88% detection accuracy in identifying and isolating malicious agents.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers propose BIGMAS (Brain-Inspired Graph Multi-Agent Systems), a new architecture that organizes specialized LLM agents in dynamic graphs with centralized coordination to improve complex reasoning tasks. The system outperformed existing approaches including ReAct and Tree of Thoughts across multiple reasoning benchmarks, demonstrating that multi-agent design provides gains complementary to model-level improvements.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers introduce REDEREF, a training-free controller that improves multi-agent LLM system efficiency by 28% token usage reduction and 17% fewer agent calls through probabilistic routing and belief-guided delegation. The system uses Thompson sampling and reflection-driven re-routing to optimize agent coordination without requiring model fine-tuning.
AIBullisharXiv β CS AI Β· Mar 167/10
π§ Researchers introduce the Human-AI Governance (HAIG) framework that treats AI systems as collaborative partners rather than mere tools, proposing a trust-utility approach to governance across three dimensions: Decision Authority, Process Autonomy, and Accountability Configuration. The framework aims to enable adaptive regulatory design for evolving AI capabilities, particularly as foundation models and multi-agent systems demonstrate increasing autonomy.
AIBullisharXiv β CS AI Β· Mar 117/10
π§ MASEval introduces a new framework-agnostic evaluation library for multi-agent AI systems that treats entire systems rather than just models as the unit of analysis. Research across 3 benchmarks, models, and frameworks reveals that framework choice impacts performance as much as model selection, challenging current model-centric evaluation approaches.
AIBearisharXiv β CS AI Β· Mar 117/10
π§ A comprehensive study reveals that multi-agent AI systems (MAS) face distinct security vulnerabilities that existing frameworks inadequately address. The research evaluated 16 AI security frameworks against 193 identified threats across 9 categories, finding that no framework achieves majority coverage in any single category, with non-determinism and data leakage being the most under-addressed areas.