#multi-agent-learning News & Analysis

16 articles tagged with #multi-agent-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

16 articles

AIBullisharXiv – CS AI · Jun 197/10

🧠

Formal Verification of Learned Multi-Agent Communication Policies via Decision Tree Distillation

Researchers present the first formal verification framework for multi-agent reinforcement learning (MARL) communication policies by distilling neural networks into interpretable decision trees and verifying them with probabilistic model checking. The approach achieves 97.9% fidelity to original policies while enabling safety verification for critical robotic applications like drone swarms and autonomous vehicle fleets.

AIBullisharXiv – CS AI · Mar 56/10

🧠

Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport

Researchers developed a new three-layer hierarchy called cognition-to-control (C2C) for human-robot collaboration that combines vision-language models with multi-agent reinforcement learning. The system enables sustained deliberation and planning while maintaining real-time control for collaborative manipulation tasks between humans and humanoid robots.

AINeutralarXiv – CS AI · Jun 256/10

🧠

Offline Multi-agent Continual Cooperation via Skill Partition and Reuse

Researchers introduce COMAD, a framework for multi-agent reinforcement learning systems to continually discover and reuse coordination skills from offline data without catastrophic forgetting. The approach uses skill partitioning and density-based reusability estimation to enable agents to efficiently transfer knowledge across sequential tasks in open environments.

AIBullisharXiv – CS AI · Jun 236/10

🧠

Decentralized Autonomous Traffic Management through Corridor Networks

Researchers have developed a decentralized multi-agent reinforcement learning approach to manage autonomous aircraft traffic in Advanced Air Mobility (AAM) corridor networks without centralized coordination. The system successfully generalizes policies trained on single corridors to complex multi-corridor scenarios with merges, splits, and varying traffic conditions, suggesting scalable solutions for future autonomous aviation infrastructure.

AIBullisharXiv – CS AI · Jun 116/10

🧠

CCKS: Consensus-based Communication and Knowledge Sharing

Researchers propose CCKS, a consensus-based framework for improving multi-agent reinforcement learning through smarter knowledge sharing between agents. The approach uses contrastive learning to build consensus models that allow agents to selectively adopt teacher guidance, demonstrating significant performance improvements in complex environments like Google Research Football and StarCraft II.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Continual Quadruped Robots Coordination via Semantic Skill Discovery

Researchers present Conquer, a semantic skill-library framework enabling multi-quadruped robots to learn new coordination tasks sequentially without forgetting previously acquired skills. The system uses a variable-cardinality architecture and semantic descriptors to retrieve and adapt existing skills for new tasks, achieving 95.6% success rates in simulation and real-world validation on Unitree Go2 robots.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Unsupervised Partner Design Enables Robust Ad-hoc Teamwork

Researchers introduce Unsupervised Partner Design (UPD), a multi-agent reinforcement learning method that generates and adaptively selects training partners without requiring pre-trained populations or manual tuning. The approach demonstrates strong performance across multiple benchmarks and achieves higher human preference ratings for adaptability and naturalness compared to existing baselines.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Emergent Language as an Approach to Conscious AI

Researchers propose using emergent language in multi-agent reinforcement learning as a methodology to study artificial consciousness, where agents develop communication from minimal constraints to reveal whether consciousness-relevant structures arise from task demands rather than human language biases. A proof-of-concept demonstrates agents spontaneously develop self-referential communication and an echo-mismatch detection mechanism, suggesting genuine cognitive emergence rather than inherited patterns.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Regret Minimization with Adaptive Opponents in Repeated Games

Researchers introduce Repeated Policy Regret (RP-Regret), a new game-theoretic metric for analyzing regret minimization in repeated games with adaptive opponents who can respond to historical play. The paper proposes three algorithms to minimize RP-Regret despite its non-convex nature and demonstrates that when all players use these algorithms, certain subgame perfect equilibria can be learned, with experiments showing improved cooperation in games like Stag-Hunt.

AINeutralarXiv – CS AI · Jun 26/10

🧠

TriAlign: Towards Universal Truth Consistency in Personalized LLM Alignment

Researchers introduce TriAlign, a machine learning framework that addresses fairness issues in personalized large language models by ensuring universal truths remain consistent across different social groups. The method balances accuracy, fairness, and personalization through multi-agent reinforcement learning, reducing disparities in objective task performance while maintaining user preference adaptation.

AINeutralarXiv – CS AI · May 296/10

🧠

Recurrent Structural Policy Gradient for Partially Observable Mean Field Games

Researchers introduce Recurrent Structural Policy Gradient (RSPG), an algorithmic advancement for solving Mean Field Games with partial observability by combining policy gradient methods with structural knowledge of system dynamics. The method achieves significantly faster convergence than model-free approaches while enabling history-aware behavior, accompanied by MFAX, a new JAX-based research framework for MFG implementations.

AINeutralarXiv – CS AI · May 286/10

🧠

Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity

Researchers propose a Personalized Observation Normalization (PON) method to address challenges in federated reinforcement learning across heterogeneous environments. The technique allows individual agents to maintain localized normalization statistics while collaborating on a shared policy, improving training efficiency and performance without compromising privacy.

AINeutralarXiv – CS AI · May 276/10

🧠

TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning

Researchers introduce TABX, a high-throughput multi-agent reinforcement learning simulator built on JAX that enables GPU-accelerated testing of cooperative AI algorithms. The framework prioritizes modularity and customization, allowing systematic investigation of emergent agent behaviors across varying task complexities with significantly reduced computational overhead.

AINeutralarXiv – CS AI · May 116/10

🧠

Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models

Researchers introduce Mutual Reinforcement Learning, a framework enabling heterogeneous language models to share training experiences while maintaining separate parameters and tokenizers. The system uses three mechanisms—Shared Experience Exchange, Multi-Worker Resource Allocation, and a Tokenizer Heterogeneity Layer—to coordinate reinforcement learning across incompatible model architectures, with outcome-level success transfer showing the best stability-support trade-off.

AINeutralarXiv – CS AI · May 96/10

🧠

Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation

Researchers introduce CoNL, a framework that enables large language models to improve themselves through multi-agent self-play without requiring ground-truth labels or external judges. The system uses critiques that successfully improve solutions as training signals, allowing models to jointly optimize both generation and evaluation capabilities for non-verifiable tasks like creative writing and ethical reasoning.

AIBullisharXiv – CS AI · Apr 146/10

🧠

Interactive Learning for LLM Reasoning

Researchers introduce ILR, a novel multi-agent learning framework that enables Large Language Models to enhance their independent reasoning through interactive training with other LLMs, then solve problems autonomously without re-executing the multi-agent system. The approach combines dynamic interaction strategies and perception calibration, delivering up to 5% performance improvements across mathematical, coding, and reasoning benchmarks.