Regret Minimization with Adaptive Opponents in Repeated Games
Researchers introduce Repeated Policy Regret (RP-Regret), a new game-theoretic metric for analyzing regret minimization in repeated games with adaptive opponents who can respond to historical play. The paper proposes three algorithms to minimize RP-Regret despite its non-convex nature and demonstrates that when all players use these algorithms, certain subgame perfect equilibria can be learned, with experiments showing improved cooperation in games like Stag-Hunt.
This paper addresses a fundamental gap in game theory and online learning by extending regret minimization frameworks to account for adaptive opponent behavior in repeated games. Traditional external regret metrics fail to capture the strategic interdependencies that emerge when players anticipate counterfactual responses, creating a mismatch between theory and practical multi-agent scenarios.
The introduction of RP-Regret represents a conceptual advancement in game-theoretic analysis. Unlike existing regret notions, RP-Regret measures performance against best-in-hindsight strategies when all players can condition responses on observed histories. This native approach to repeated games enables stronger comparators and reduces constraints on opponent modeling, moving closer to realistic strategic interaction.
The algorithmic contributions address the non-convex optimization challenge inherent to RP-Regret minimization. The three proposed algorithms—oracle-based, linearized-surrogate, and slow-adaptation variants—offer practical pathways for computing near-optimal strategies. The finding that certain subgame perfect equilibria become learnable when all players minimize RP-Regret has implications for understanding equilibrium selection in multi-agent systems.
Experimentally, the approach demonstrates tangible benefits in coordination games, generating higher-utility cooperative outcomes compared to baseline methods. This suggests RP-Regret minimization could improve cooperation in distributed systems, auction mechanisms, and other multi-agent settings where strategic adaptation occurs. The work bridges theoretical game theory with practical online learning, though real-world applications depend on whether learning algorithms can be coordinated across independent agents in adversarial environments.
- →RP-Regret provides a game-theoretic metric that captures adaptive opponent behavior better than standard external regret in repeated games
- →Three algorithms address the non-convex optimization challenge of minimizing RP-Regret, each suited to different strategic scenarios
- →When all players minimize RP-Regret, certain subgame perfect equilibria become learnable without explicit coordination
- →Experimental results show RP-Regret minimization produces more cooperative outcomes with higher utility in coordination games
- →The framework enables stronger comparators and fewer constraints on opponent models compared to existing regret notions