HALyPO: Heterogeneous-Agent Lyapunov Policy Optimization for Human-Robot Collaboration
Researchers developed HALyPO (Heterogeneous-Agent Lyapunov Policy Optimization), a new approach to improve stability in human-robot collaboration through multi-agent reinforcement learning. The method addresses the 'rationality gap' between human and robot learning by using Lyapunov stability conditions to prevent policy oscillations and divergence during training.