AIBullisharXiv โ CS AI ยท 15h ago7/10
๐ง
RM-R1: Reward Modeling as Reasoning
Researchers introduce RM-R1, a new class of Reasoning Reward Models (ReasRMs) that integrate chain-of-thought reasoning into reward modeling for large language models. The models outperform much larger competitors including GPT-4o by up to 4.9% across reward model benchmarks by using a chain-of-rubrics mechanism and two-stage training process.
๐ง GPT-4๐ง Llama