GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling
GeoMin, a new semi-supervised reinforcement learning method, advances LLM reasoning by using geometric distribution modeling to better utilize unlabeled data. The approach achieves 4.1% performance gains over existing methods and matches fully supervised models with only 10% of the annotation data, significantly improving data efficiency in AI training.
GeoMin addresses a fundamental challenge in modern AI development: the trade-off between training performance and annotation costs. Reinforcement learning with verifiable rewards has proven valuable for improving LLM reasoning capabilities, but scaling these systems traditionally requires expensive human labeling. The method builds on recent semi-supervised approaches that use small labeled datasets to guide learning on unlabeled data, but identifies and solves a critical bottleneck—existing methods rely on coarse performance heuristics that fail to extract value from most unlabeled instances.
The innovation centers on geometric distribution modeling, which characterizes structural differences between correct and incorrect model outputs in the labeled data. This learned prior enables the system to reliably assess self-generated reward signals on unlabeled data, effectively multiplying the utility of scarce annotations. This represents a meaningful advancement in data-efficient AI training, where annotation costs remain a primary constraint for researchers and companies developing reasoning-capable language models.
For the broader AI development ecosystem, GeoMin's results suggest that annotation efficiency improvements can rival fully supervised training approaches—a significant finding given that labeled data collection represents a major bottleneck and expense in large-scale AI projects. The 10% annotation threshold for achieving supervised-equivalent performance has direct implications for development timelines and costs across industry applications.
The practical impact extends beyond academic interest, as data efficiency directly influences the economics of AI model development. Organizations investing in LLM reasoning capabilities will benefit from reduced annotation requirements, accelerating deployment cycles and reducing resource allocation to data labeling pipelines.
- →GeoMin outperforms baseline semi-supervised methods by 4.1% while matching fully supervised performance with only 10% of labeled data
- →Geometric distribution modeling on labeled data enables reliable assessment of self-generated reward signals on unlabeled instances
- →The approach addresses the severe data-efficiency bottleneck in semi-supervised reinforcement learning for LLMs
- →Reduced annotation requirements directly lower costs and accelerate timelines for developing reasoning-capable AI systems
- →Method demonstrates that structural modeling of correct vs. incorrect outputs can significantly improve unlabeled data utilization