y0news
← Feed
Back to feed
🧠 AI🟢 BullishImportance 7/10

Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning

arXiv – CS AI|Chengshuai Shi, Wenzhe Li, Xinran Liang, Yizhou Lu, Wenjia Yang, Ruirong Feng, Seth Karten, Ziran Yang, Zihan Ding, Gabriel Sarch, Danqi Chen, Karthik Narasimhan, Chi Jin|
🤖AI Summary

Researchers introduce Odysseus, an open framework for training vision-language models (VLMs) to handle 100+ turn decision-making tasks using reinforcement learning, demonstrated through Super Mario Land gameplay. The work achieves 3x better performance than existing models while maintaining general capabilities, advancing the frontier of embodied AI agents.

Analysis

The Odysseus framework represents a meaningful advancement in scaling reinforcement learning for multi-modal AI systems. Previous attempts to apply VLMs to interactive tasks either relied on expensive supervised fine-tuning or struggled with short-horizon problems lasting only 20-30 decision steps. This research pushes beyond those constraints by successfully training models for 100+ turn interactions, a substantial increase in complexity that requires sophisticated coordination between perception, reasoning, and action selection.

The technical contribution centers on adapting PPO (Proximal Policy Optimization) with a lightweight turn-level critic, addressing fundamental training stability issues that plague longer-horizon RL. The researchers demonstrate that pretrained VLMs provide valuable action priors that dramatically improve sample efficiency compared to training deep RL agents from scratch. This insight has practical significance: it suggests that foundation models already encode useful behavioral knowledge that can be leveraged rather than discarded.

From an industry perspective, this work accelerates the development pathway toward embodied AI agents capable of extended autonomous decision-making. The framework's open-source nature could enable broader adoption across robotics, gaming, and simulation domains. The demonstrated cross-game generalization indicates these models develop transferable skills rather than memorizing game-specific patterns, suggesting real-world applications in dynamic environments.

The research also validates that scaling long-horizon RL for VLMs is feasible without massive computational investments. This democratizes development of embodied agents, potentially attracting more researchers to this frontier. Future work will likely focus on extending these techniques to other complex domains and reducing sample requirements further.

Key Takeaways
  • Odysseus enables VLMs to handle 100+ turn decision-making tasks, surpassing previous 20-30 turn limits through improved PPO with critic mechanisms
  • Pretrained VLM action priors substantially improve training efficiency, reducing manual engineering and computational overhead versus training from scratch
  • The framework achieves 3x average game progress improvements over frontier models while maintaining general-domain capabilities
  • Cross-game generalization demonstrates transferable skill learning rather than overfitting, indicating potential for diverse real-world applications
  • Open-source framework design could accelerate embodied AI development across robotics, simulation, and interactive environments
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles