AIBullisharXiv – CS AI · 9h ago7/10
🧠
SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions
SUPERNOVA introduces a framework for extending reinforcement learning with verifiable rewards (RLVR) beyond STEM fields by systematically curating data from natural instruction datasets. A 25K-instance dataset trained on smaller models achieves 64.4 percentage point gains on complex reasoning benchmarks, with improvements generalizing across model scales and families.