AINeutralarXiv โ CS AI ยท 4d ago6/103
๐ง
Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework
Researchers have developed a new preference learning framework that addresses bias in AI alignment by ensuring policies reflect true population distributions rather than just majority opinions. The approach uses social choice theory principles and has been validated on both recommendation tasks and large language model alignment.