🧠 AI🟢 BullishImportance 6/10

OM2P: Offline Multi-Agent Mean-Flow Policy

arXiv – CS AI|Zhuoran Li, Xun Wang, Hai Zhong, Qingxin Xia, Lihua Zhang, Longbo Huang|March 2, 2026 at 05:00 AM|15 views

🤖AI Summary

Researchers propose OM2P, a new offline multi-agent reinforcement learning algorithm that achieves efficient one-step action sampling using mean-flow models. The approach delivers up to 3.8x reduction in GPU memory usage and 10.8x speed-up in training time compared to existing diffusion and flow-based models.

Key Takeaways

→OM2P is the first algorithm to successfully integrate mean-flow models into offline multi-agent reinforcement learning
→The approach solves sampling efficiency problems that plague existing diffusion and flow-based policies
→Performance improvements include up to 3.8x reduction in GPU memory usage and 10.8x faster training times
→The algorithm introduces reward-aware optimization that combines mean-flow matching loss with Q-function supervision
→Empirical testing on Multi-Agent Particle and MuJoCo benchmarks demonstrates superior performance over existing methods