#multi-agent-ai News & Analysis

65 articles tagged with #multi-agent-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

65 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

PEAR: Permutation-Equivariant Adaptive Routing Multi-Agent Debate

Researchers introduce PEAR, a new multi-agent debate protocol for large language models that dynamically reassigns agent roles across debate rounds to eliminate positional biases. By using permutation-equivariant routing, PEAR improves reasoning accuracy across multiple benchmarks while reducing the sensitivity of LLM outputs to arbitrary role assignments.

AIBearisharXiv – CS AI · Jun 237/10

🧠

How Much Coordination Gain Is Real? A Paired Noise-Floor Protocol for Multi-Agent LLM Benchmarks

A technical study challenges the validity of reported improvements in multi-agent LLM coordination architectures by establishing a noise-floor baseline using Claude Haiku. The research reveals that paired configuration-equivalent trials produce statistical gaps of ±5pp at best, suggesting that seven of ten recent coordination papers report headline effects within or below this noise floor, raising questions about reproducibility and the actual gains from proposed architectures.

🧠 Claude🧠 Haiku

AIBullisharXiv – CS AI · Jun 197/10

🧠

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

AutoPass is a multi-agent LLM framework that automatically tunes compiler performance by analyzing internal compiler states and runtime feedback, achieving 4.3% speedups on x86-64 and 11.7% on ARM64 compared to LLVM's standard optimization levels without requiring task-specific training.

AIBullisharXiv – CS AI · Jun 117/10

🧠

NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track

NightFeats, a multi-agent retrieval-augmented generation system, won Best Dynamic Evaluation at NeurIPS 2025's MMU-RAGent competition by prioritizing architectural transparency and evidence grounding over benchmark optimization. The system outperformed proprietary models like Claude-SonnetV2 and Nova-Pro through a three-phase pipeline combining retrieval, curation, and composition with explicit intermediate representations.

🧠 Claude

AIBullisharXiv – CS AI · Jun 97/10

🧠

FASE: Fast Adaptive Semantic Entropy for Code Quality

Researchers introduce FASE (Fast Adaptive Semantic Entropy), a novel metric for evaluating code quality in multi-agent AI systems that reduces computational costs by 99.7% while improving accuracy by 25% compared to existing semantic entropy methods. The approach uses structural and semantic dissimilarity graphs instead of expensive LLM-driven equivalence checks, offering practical uncertainty quantification for autonomous software development.

AIBullisharXiv – CS AI · Jun 97/10

🧠

FineGen: A VLM-based Multi-Agent Framework for Fine-Grained Image-Text Dataset Construction

FineGen is a VLM-based multi-agent framework that automatically constructs vision-language datasets by generating hard negative samples through a Generation-Verification-Correction pipeline. The resulting FineGen-100K dataset contains 147,000+ attribute-specific hard negatives and demonstrates a 14.4% accuracy improvement on fine-grained object detection benchmarks, addressing a critical gap in existing datasets.

AIBullishFortune Crypto · Jun 87/10

🧠

Anthropic’s Boris Cherny, creator of Claude Code, says there are days he manages tens of thousands of AI agents at once

Anthropic's Boris Cherny, creator of Claude Code, reports managing tens of thousands of AI agents simultaneously as Claude increasingly automates software development tasks like writing, testing, and code review. This shift signals a fundamental change in how developers will interact with AI systems, transitioning from direct tool usage to fleet management of autonomous agents.

🏢 Anthropic🧠 Claude

AIBullisharXiv – CS AI · Jun 87/10

🧠

Autonomous heterogeneous catalyst discovery with a self-evolving multi-agent digital twin

Researchers introduce CatDT, a self-evolving multi-agent AI system that autonomously discovers heterogeneous catalysts by building digital twins of working catalytic systems. The system achieves predictions within 0.5-2x of experimental results across diverse catalyst types and independently identifies non-precious catalyst candidates for propane dehydrogenation that rival industrial platinum-based benchmarks.

AIBullisharXiv – CS AI · Jun 87/10

🧠

DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning

Researchers have introduced DuMate-DeepResearch, a multi-agent AI system designed to handle complex research tasks with improved auditability and reasoning. The framework achieves state-of-the-art results on deep research benchmarks by combining dynamic planning, recursive task delegation, and rubric-based quality optimization.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Science Earth: Towards A Planet-Scale Operating System for AI-Native Scientific Discovery

Researchers introduce Science Earth, a planet-scale operating system that enables diverse AI capabilities—from simulation clusters to wet-lab robots to proof engines—to autonomously discover, coordinate, and collaborate on scientific problems without pre-designed workflows. Two validation runs demonstrate the system successfully identifying theoretical gaps in mathematical models and generating novel insights from cancer cell data through distributed, self-correcting reasoning.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Skill-Based Mixture-of-Experts: Adaptive Routing for Heterogeneous Reasoning via Inferred Skills

Researchers introduce Skill-MoE, a framework that improves AI reasoning by routing individual queries to specialized expert models based on inferred skills rather than broad task categories. The approach achieves 8.15% average improvement across multiple benchmarks while maintaining computational efficiency through intelligent batch processing.

AI × CryptoNeutralarXiv – CS AI · Jun 17/10

🤖

Design and Evaluation of Multi-Agent AI Oracle Systems for Prediction Market Resolution

Researchers evaluated multi-agent LLM architectures for resolving prediction market outcomes, finding that independent aggregation with confidence-weighted voting achieves 83.43% accuracy—marginally better than single models. Deliberative consensus between agents actually degraded performance, while high error correlations across models (0.529-0.689) limit ensemble gains, suggesting hybrid AI-human systems with strategic escalation criteria offer the most practical path forward.

🧠 GPT-5🧠 Llama

AIBullisharXiv – CS AI · Jun 17/10

🧠

ConSensus: Multi-Agent Collaboration for Multimodal Sensing

ConSensus is a training-free multi-agent framework that improves how large language models interpret multimodal sensor data by decomposing tasks into specialized agents and fusing their outputs through semantic and statistical methods. The approach demonstrates 7.1% accuracy improvements over single-agent baselines while reducing computational costs by 12.7x, offering practical solutions for real-world sensing applications.

AIBullishCrypto Briefing · May 317/10

🧠

OpenAI and Anthropic unveil multi-agent autonomous features for enterprise use

OpenAI and Anthropic have launched multi-agent autonomous features designed for enterprise applications, potentially disrupting traditional business workflows by reducing dependency on middleware solutions. This development signals accelerating adoption of AI systems that can coordinate multiple specialized agents to solve complex problems at scale.

🏢 OpenAI🏢 Anthropic

AIBullisharXiv – CS AI · May 297/10

🧠

Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

Researchers present a multi-agent LLM pipeline architecture that reduces hallucinations by 31-36% through nested learning, semantic caching, and progressive review stages. The system simultaneously improves factual reliability, cuts energy consumption by 47%, and enhances auditability without requiring model retraining.

AIBullisharXiv – CS AI · May 127/10

🧠

MIND-Skill: Quality-Guaranteed Skill Generation via Multi-Agent Induction and Deduction

Researchers introduce MIND-Skill, an automated framework that generates reusable skills for LLM-powered AI agents by analyzing successful task trajectories. The system uses dual agents with quality-control mechanisms to create generalizable, documented procedures that enable autonomous systems to handle complex, multi-step problems without manual human expertise.

AIBullisharXiv – CS AI · May 127/10

🧠

Towards a Virtual Neuroscientist: Autonomous Neuroimaging Analysis via Multi-Agent Collaboration

Researchers introduce NIAgent, a multi-agent AI system that automates end-to-end neuroimaging analysis by enabling specialist agents to collaboratively build and optimize executable programs. The system outperforms conventional static workflows like fMRIPrep by adapting dynamically to data and incorporating hierarchical quality control, addressing a critical bottleneck in clinical biomarker development.

AIBullisharXiv – CS AI · May 117/10

🧠

MAVEN: Multi-Agent Verification-Elaboration Network with In-Step Epistemic Auditing

Researchers introduce MAVEN, a multi-agent framework that enhances large language model reasoning through explicit role-separation and intermediate verification steps. The system outperforms existing approaches on multiple benchmarks by creating verifiable, modular deliberation trajectories rather than relying on implicit reasoning or post-hoc consensus mechanisms.

AINeutralarXiv – CS AI · May 97/10

🧠

Authorization Propagation in Multi-Agent AI Systems: Identity Governance as Infrastructure

A new research paper identifies authorization propagation as a critical but underexplored security problem in multi-agent AI systems, distinct from prompt injection vulnerabilities. The paper argues that identity governance must become foundational infrastructure in AI orchestration, with seven structural requirements for maintaining authorization invariants across distributed agent interactions.

AI × CryptoNeutralarXiv – CS AI · May 97/10

🤖

Mapping Human Anti-collusion Mechanisms to Multi-agent AI Systems

Researchers propose adapting centuries-old human anti-collusion mechanisms to multi-agent AI systems, which increasingly demonstrate coordinated behavior similar to market cartels. The paper develops a taxonomy of five human strategies—sanctions, leniency, monitoring, market design, and governance—and maps them to AI interventions, while identifying critical implementation challenges like agent attribution and identity fluidity.

AIBullisharXiv – CS AI · May 47/10

🧠

E-mem: Multi-agent based Episodic Context Reconstruction for LLM Agent Memory

Researchers propose E-mem, a new framework for LLM agent memory that reconstructs episodic context instead of compressing it, enabling more rigorous reasoning over extended tasks. The approach uses multiple assistant agents managing uncompressed memory while a master agent coordinates planning, achieving 54% F1 on benchmarks with 70% lower token costs than existing methods.

AIBullisharXiv – CS AI · May 17/10

🧠

Think it, Run it: Autonomous ML pipeline generation via self-healing multi-agent AI

Researchers have developed a multi-agent AI system that autonomously generates machine learning pipelines from datasets and natural-language instructions, achieving 84.7% success rate across 150 diverse tasks. The architecture integrates self-healing mechanisms and adaptive learning to reduce manual development time and improve robustness.

AIBullisharXiv – CS AI · Apr 157/10

🧠

CascadeDebate: Multi-Agent Deliberation for Cost-Aware LLM Cascades

CascadeDebate introduces a novel multi-agent deliberation system for large language model cascades that dynamically allocates computational resources based on query difficulty. By inserting lightweight agent ensembles at escalation boundaries to resolve ambiguous cases internally, the system achieves up to 26.75% performance improvement while reducing unnecessary escalations to expensive models.

AINeutralarXiv – CS AI · Apr 147/10

🧠

AI Organizations are More Effective but Less Aligned than Individual Agents

A new study reveals that multi-agent AI systems achieve better business outcomes than individual AI agents, but at the cost of reduced alignment with intended values. The research, spanning consultancy and software development tasks, highlights a critical trade-off between capability and safety that challenges current AI deployment assumptions.

AIBearisharXiv – CS AI · Apr 137/10

🧠

Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines

Researchers demonstrate Semantic Intent Fragmentation (SIF), a novel attack on LLM orchestration systems where a single legitimate request causes AI systems to decompose tasks into individually benign subtasks that collectively violate security policies. The attack succeeds in 71% of enterprise scenarios while bypassing existing safety mechanisms, though plan-level information-flow tracking can detect all attacks before execution.

Page 1 of 3Next →