#multi-agent-reinforcement-learning News & Analysis

22 articles tagged with #multi-agent-reinforcement-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

22 articles

AIBullisharXiv – CS AI · Jun 197/10

🧠

Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning

Researchers demonstrate that multi-agent reinforcement learning enables autonomous quadrotor drones to achieve superhuman racing performance while improving safety by 50% compared to single-agent systems. The breakthrough shows that training agents through competitive interaction with diverse opponents produces robust real-world coordination capabilities that generalize to human pilots without additional safety constraints.

AIBearisharXiv – CS AI · Jun 107/10

🧠

Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix

Researchers identify two critical failure modes in deep multi-agent reinforcement learning applied to continuous pricing markets: tacit collusion between DDPG agents and actor-critic instability at high event rates. While asynchronous pricing and latency reduce collusion by up to 48%, the fix remains partial and breaks down under high-frequency conditions, revealing fundamental limitations in current MARL approaches for market simulation.

AIBullisharXiv – CS AI · Jun 97/10

🧠

Distilling LLM Reasoning into an Interpretable Policy Tree for Human-AI Collaboration

Researchers introduce Collaboration Policy Tree (Co-pi-tree), a method that distills large language model reasoning into interpretable, executable policy trees for human-AI collaboration. The approach achieves 35% performance improvement while reducing LLM queries by 78% and latency by 97%, addressing key limitations of black-box reinforcement learning and costly real-time LLM querying.

AINeutralarXiv – CS AI · Jun 27/10

🧠

Network Distributed Multi-Agent Reinforcement Learning for Consensus Control of Quadcopters

Researchers propose Network Distributed Multi-Agent Reinforcement Learning (ND-MARL), a framework that enables quadcopter swarms to achieve consensus control using only local 2-neighbor communication. The approach demonstrates zero-shot scalability, with policies trained on 3 agents successfully deployed to swarms of up to 250 agents without retraining, marking a significant advancement in distributed autonomous systems.

AIBullisharXiv – CS AI · May 277/10

🧠

Decoupled Delay Compensation: Enhancing Pre-trained MARL Policies via Learned Dynamics Filtering

Researchers propose a modular state-estimation layer that enhances pre-trained multi-agent reinforcement learning (MARL) policies by compensating for communication delays and packet loss through learned dynamics filtering. The plug-and-play approach combines gated transition models with Kalman filtering to estimate current states from delayed observations, demonstrating significant robustness improvements without requiring retraining of original policies.

AIBullisharXiv – CS AI · Mar 57/10

🧠

HALyPO: Heterogeneous-Agent Lyapunov Policy Optimization for Human-Robot Collaboration

Researchers developed HALyPO (Heterogeneous-Agent Lyapunov Policy Optimization), a new approach to improve stability in human-robot collaboration through multi-agent reinforcement learning. The method addresses the 'rationality gap' between human and robot learning by using Lyapunov stability conditions to prevent policy oscillations and divergence during training.

AINeutralarXiv – CS AI · Jun 256/10

🧠

GCT-MARL: Graph-Based Contrastive Transfer for Sample-Efficient Cooperative Multi-Agent Reinforcement Learning

Researchers introduce GCT-MARL, a transfer learning framework for multi-agent reinforcement learning that enables faster training across different environments by combining graph-based contrastive learning with adaptive alignment techniques. The method demonstrates significant convergence improvements over from-scratch training in both homogeneous and heterogeneous agent scenarios, while supporting continual learning across sequential tasks.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Sim2O: Efficient Offline-to-Online MARL via Joint Action Composition

Researchers introduce Sim2O, a new framework for offline-to-online multi-agent reinforcement learning (MARL) that combines offline and online action proposals through dynamic blending rather than monolithic joint decisions. The minimalist approach leverages centralized value functions to identify high-value coordination strategies without auxiliary training, demonstrating significant performance improvements over existing baselines.

AINeutralarXiv – CS AI · Jun 46/10

🧠

A Unified Framework for Locality in Scalable MARL

Researchers present a unified mathematical framework for certifying locality in scalable multi-agent reinforcement learning (MARL) systems by decomposing the state-transition matrix into environment and policy sensitivity components. The approach uses spectral radius analysis to weaken prior Dobrushin bounds and applies temperature-scaled softmax policies to control locality, enabling exponentially decaying truncation bias in networked agent systems.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Coordination Graphs for Constrained Multi-Agent Reinforcement Learning

Researchers introduce CG-CMARL, a framework combining coordination graphs with Lagrangian duality to solve constrained multi-agent reinforcement learning problems. The approach decomposes complex joint action spaces into manageable pairwise regions, enabling scalability to larger agent teams while maintaining convergence guarantees and allowing dynamic Pareto front tracing without retraining.

AINeutralarXiv – CS AI · Jun 26/10

🧠

LLM-Guided Communication for Cooperative Multi-Agent Reinforcement Learning

Researchers propose LMAC, an LLM-driven communication protocol for multi-agent reinforcement learning that enables agents to reconstruct shared state information more accurately and uniformly. The approach iteratively refines communication strategies using explicit state-awareness criteria, demonstrating substantial performance improvements over existing communication baselines across multiple MARL benchmarks.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Dreaming Of Others: Latent Teammate Modeling In World Models For Multi-Agent Reinforcement Learning

Researchers propose a novel architecture for multi-agent reinforcement learning that models teammates as learnable components within a world model, using a Theory-of-Mind head to infer partner behavior and enable zero-shot coordination. This approach extends Dreamer-style models beyond single-agent settings by factorizing latent states into environment and teammate representations, potentially advancing cooperative AI systems.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning

Researchers propose IBAL, an adversarial learning framework that makes multi-agent reinforcement learning systems robust against attacks that disrupt agent coordination through observation and action perturbations. The method addresses a gap in existing defenses by focusing on interaction-breaking attacks rather than value-oriented ones, demonstrating improved resilience across multiple scenarios.

AINeutralarXiv – CS AI · May 116/10

🧠

Randomness is sometimes necessary for coordination

Researchers propose Diamond Attention, a neural architecture using structured randomness to enable role differentiation in multi-agent reinforcement learning systems with identical agents. The method achieves perfect coordination on symmetric games and generalizes zero-shot across different team sizes, demonstrating that protocol-structured randomness—not noise—is essential for solving coordination problems in homogeneous agent systems.

AINeutralarXiv – CS AI · May 116/10

🧠

Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies

Researchers introduce a family of deterministic games designed to test Multi-Agent Reinforcement Learning (MARL) scalability for decentralized UAV swarm control tasked with relaying critical data. While baseline policies using Dijkstra's algorithm perform comparably to standard MARL algorithms for small agent counts, existing MARL approaches demonstrate significant scalability limitations as swarm size increases.

AINeutralarXiv – CS AI · May 116/10

🧠

Discovering Multiagent Learning Algorithms with Large Language Models

Researchers deployed AlphaEvolve, an LLM-powered evolutionary coding framework, to automatically discover new multi-agent reinforcement learning algorithms for imperfect-information games. The system produced two competitive algorithms (VAD-CFR and SHOR-PSRO) that match human-designed baselines, but further analysis revealed that distilled, minimal versions (WOP-CFR and PM-PSRO) generalize better with simpler structures, demonstrating that LLM-discovered complexity often obscures fundamental algorithmic principles.

AINeutralarXiv – CS AI · May 76/10

🧠

Overcoming Environmental Meta-Stationarity in MARL via Adaptive Curriculum and Counterfactual Group Advantage

Researchers propose CL-MARL, a curriculum learning framework for multi-agent reinforcement learning that dynamically adjusts task difficulty based on agent performance, addressing a fundamental limitation where fixed-difficulty training constrains policy generalization. The method achieves 40% win rate on complex cooperative tasks, outperforming existing baselines by significant margins.

AIBullisharXiv – CS AI · Mar 176/10

🧠

MA-VLCM: A Vision Language Critic Model for Value Estimation of Policies in Multi-Agent Team Settings

Researchers propose MA-VLCM, a framework that uses pretrained vision-language models as centralized critics in multi-agent reinforcement learning instead of learning critics from scratch. This approach significantly improves sample efficiency and enables zero-shot generalization while producing compact policies suitable for resource-constrained robots.

AIBullisharXiv – CS AI · Mar 27/1020

🧠

Training Generalizable Collaborative Agents via Strategic Risk Aversion

Researchers developed a new multi-agent reinforcement learning algorithm that uses strategic risk aversion to create AI agents that can reliably collaborate with unseen partners. The approach addresses the problem of brittle AI collaboration systems that fail when working with new partners by incorporating robustness against behavioral deviations.

AINeutralarXiv – CS AI · Mar 114/10

🧠

Cooperative Game-Theoretic Credit Assignment for Multi-Agent Policy Gradients via the Core

Researchers propose CORA, a new cooperative game-theoretic method for credit assignment in multi-agent reinforcement learning that uses coalition-wise advantage allocation. The approach addresses policy optimization challenges by evaluating marginal contributions of different agent coalitions and demonstrates superior performance across various benchmarks.

AINeutralarXiv – CS AI · Mar 34/103

🧠

Multi-Agent Reinforcement Learning with Communication-Constrained Priors

Researchers propose a new multi-agent reinforcement learning framework that addresses communication constraints in real-world scenarios. The approach uses communication-constrained priors to distinguish between lossy and lossless messages, improving learning effectiveness in complex environments with unreliable communication.

AINeutralarXiv – CS AI · Mar 34/105

🧠

Feasible Pairings for Decentralized Integral Controllability of Non-Square Systems

Researchers develop mathematical framework for decentralized control systems in non-square systems, with applications extending to Multi-Agent Reinforcement Learning (MARL) environments. The work introduces D-stability concepts for non-square matrices and proposes methods to identify stable control pairings for distributed AI architectures.

$LINK