y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#self-play News & Analysis

10 articles tagged with #self-play. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles
AIBullisharXiv โ€“ CS AI ยท Mar 117/10
๐Ÿง 

PlayWorld: Learning Robot World Models from Autonomous Play

PlayWorld introduces a breakthrough AI system that trains robot world simulators entirely from autonomous robot self-play, eliminating the need for human demonstrations. The system achieves 40% improvements in failure prediction and 65% policy performance gains when deployed in real-world scenarios.

AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

Researchers introduce Vision-Zero, a self-improving AI framework that trains vision-language models through competitive games without requiring human-labeled data. The system uses strategic self-play and can work with arbitrary images, achieving state-of-the-art performance on reasoning and visual understanding tasks while reducing training costs.

AIBullisharXiv โ€“ CS AI ยท Mar 37/103
๐Ÿง 

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Researchers introduce SPIRAL, a self-play reinforcement learning framework that enables language models to develop reasoning capabilities by playing zero-sum games against themselves without human supervision. The system improves performance by up to 10% across 8 reasoning benchmarks on multiple model families including Qwen and Llama.

AIBullishOpenAI News ยท Oct 117/104
๐Ÿง 

Competitive self-play

Researchers demonstrate that AI self-play training enables simulated agents to autonomously develop complex physical skills like tackling, ducking, and ball handling without explicit programming. Combined with successful Dota 2 results, this suggests self-play will be fundamental to future powerful AI systems.

AIBullishOpenAI News ยท Aug 167/103
๐Ÿง 

More on Dota 2

OpenAI's Dota 2 AI system demonstrated rapid improvement through self-play, advancing from matching high-ranked players to beating top professionals in just one month. The system showcases how self-play can drive AI performance from sub-human to superhuman levels when given sufficient computational resources.

AIBullishOpenAI News ยท Aug 117/105
๐Ÿง 

Dota 2

OpenAI has developed an AI bot that defeats world-class professional players in 1v1 Dota 2 matches under standard tournament rules. The bot learned entirely through self-play without using imitation learning or tree search techniques, representing a significant advancement in AI systems handling complex, real-world scenarios.

AIBullisharXiv โ€“ CS AI ยท Apr 76/10
๐Ÿง 

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

Researchers introduce vocabulary dropout, a technique to prevent diversity collapse in co-evolutionary language model training where one model generates problems and another solves them. The method sustains proposer diversity and improves mathematical reasoning performance by +4.4 points on average in Qwen3 models.

AIBullisharXiv โ€“ CS AI ยท Feb 276/105
๐Ÿง 

To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning

Researchers introduce AOT (Adversarial Opponent Training), a self-play framework that improves Multimodal Large Language Models' robustness by having an AI attacker generate adversarial image manipulations to train a defender model. The method addresses perceptual fragility in MLLMs when processing visually complex scenes, reducing hallucinations through dynamic adversarial training.

AIBullisharXiv โ€“ CS AI ยท Feb 276/103
๐Ÿง 

Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning

Researchers developed Hierarchical Co-Self-Play (HCSP), a reinforcement learning framework that enables teams of drones to learn complex 3v3 volleyball through a three-stage training process. The system achieved an 82.9% win rate against baselines and demonstrated emergent team behaviors like role switching and coordinated formations.

AINeutralarXiv โ€“ CS AI ยท Feb 274/105
๐Ÿง 

Learning-based Multi-agent Race Strategies in Formula 1

Researchers have developed a reinforcement learning approach for multi-agent Formula 1 race strategy optimization that enables AI agents to adapt pit timing, tire selection, and energy allocation in response to competitors. The framework uses only real-race available information and could support actual race strategists' decision-making during events.