AIBullisharXiv โ CS AI ยท 5h ago1
๐ง
Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback
Researchers present a new mathematical framework for training AI reward models using Likert scale preferences instead of simple binary comparisons. The approach uses ordinal regression to better capture nuanced human feedback, outperforming existing methods across chat, reasoning, and safety benchmarks.