CHORUS: Decentralized Multi-Embodiment Collaboration with One VLA Policy
Researchers introduce CHORUS, a framework that enables decentralized multi-robot coordination using a single pretrained vision-language-action (VLA) model. Rather than requiring centralized control or per-robot policies, CHORUS allows each robot to operate independently using only its own observations and a robot-identifying prompt, achieving significant performance improvements in real-world collaborative tasks.
CHORUS represents a meaningful advancement in multi-robot systems by leveraging transfer learning from pretrained VLA models to solve the coordination problem without traditional centralized or explicitly-aligned decentralized approaches. The framework's core innovation—using a shared backbone with robot-specific prompts—demonstrates that visuomotor priors from large vision-language-action models can generalize effectively to team collaboration scenarios where individual robots operate under partial observability.
This work builds on years of robotics research exploring trade-offs between centralized and decentralized control. Centralized methods scale poorly as team size grows due to computational complexity and observation dimensionality. Decentralized approaches traditionally suffer from coordination failures without explicit communication protocols or synchronized training. CHORUS sidesteps these limitations by using the implicit understanding embedded in pretrained VLA models, which have learned rich representations from diverse visual and motor experiences.
The results—64% improvement over from-scratch decentralized baselines and 40% improvement in reactivity to teammate behavior—suggest practical advantages for real-world deployment. Real-world experiments involving physical manipulation tasks like furniture moving and laundry basket lifting validate that the approach generalizes beyond simulation. This matters for industries requiring flexible, scalable multi-robot systems such as logistics, manufacturing, and construction.
The implications extend beyond robotics. The success of using shared pretrained models for decentralized coordination could influence how AI systems handle collaborative tasks across domains. For AI developers and robotics companies, this suggests investing in better VLA models may unlock capabilities in multi-agent scenarios. The efficiency of not requiring inter-robot communication at inference time also has practical deployment benefits for systems with limited bandwidth.
- →CHORUS enables decentralized multi-robot collaboration using a single shared VLA backbone without inter-robot communication at inference time.
- →Real-world experiments demonstrate 64% performance improvement over from-scratch decentralized models and superior reactivity compared to centralized baselines.
- →Each robot operates independently using only its own observations and a robot-identifying prompt, eliminating complex alignment procedures.
- →The framework leverages visuomotor priors from pretrained vision-language-action models to enable implicit team coordination.
- →Successful real-world deployment on tasks like furniture moving and library book handovers validates practical scalability of the approach.