G-Zero: Self-Play for Open-Ended Generation from Zero Data
Researchers introduce G-Zero, a verifier-free framework that enables large language models to improve autonomously through self-play without relying on external judges or proxy models. The approach uses an intrinsic reward mechanism called Hint-δ to identify and address the Generator model's blind spots, achieving scalable self-evolution across unverifiable domains.