AIBullisharXiv – CS AI · 8h ago6/10
🧠
GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling
GeoMin, a new semi-supervised reinforcement learning method, advances LLM reasoning by using geometric distribution modeling to better utilize unlabeled data. The approach achieves 4.1% performance gains over existing methods and matches fully supervised models with only 10% of the annotation data, significantly improving data efficiency in AI training.