AIBullisharXiv โ CS AI ยท 17h ago7/10
๐ง
Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity
Researchers propose a new method for training large language models (LLMs) that addresses the diversity loss problem in reinforcement learning approaches. Their technique uses the ฮฑ-divergence family to better balance precision and diversity in reasoning tasks, achieving state-of-the-art performance on theorem-proving benchmarks.