AINeutralarXiv – CS AI · 6h ago6/10
🧠
Bias Fitting to Mitigate Length Bias of Reward Model in RLHF
Researchers propose FiMi-RM, a framework that identifies and corrects length bias in reward models used for RLHF training of large language models. The approach uses a lightweight fitting model to capture non-linear length-reward relationships and decouples them from preference scoring, reducing AI systems' tendency to favor longer responses regardless of quality.