AINeutralarXiv – CS AI · 14h ago6/10
🧠
PTCG-Bench: Can LLM Agents Master Pok\'emon Trading Card Game?
Researchers introduce PTCG-Bench, a benchmark using the Pokémon Trading Card Game to evaluate how well large language model agents can master complex strategic games and improve through self-experience. The study reveals that while LLM agents demonstrate competent gameplay, they struggle with sustained self-evolution and are heavily influenced by system design choices.