AIBullisharXiv – CS AI · Mar 276/10
🧠
Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models
Researchers developed a multi-answer reinforcement learning approach that trains language models to generate multiple plausible answers with confidence estimates in a single forward pass, rather than collapsing to one dominant answer. The method shows improved diversity and accuracy across question-answering, medical diagnosis, and coding benchmarks while being more computationally efficient than existing approaches.