AINeutralarXiv – CS AI · 14h ago6/10
🧠
Self-Play Reinforcement Learning under Imperfect Information in Big 2
Researchers develop a self-play reinforcement learning framework for Big 2, a four-player imperfect-information card game, demonstrating that PPO outperforms value-based methods under controlled conditions. The study reveals that entropy regularization and current-policy self-play improve agent performance, establishing Big 2 as a useful benchmark for testing deep RL in complex multi-agent environments with hidden information and variable action spaces.