AIBearisharXiv โ CS AI ยท 7h ago7/10
๐ง
Self-Preference Bias in Rubric-Based Evaluation of Large Language Models
Researchers reveal that Large Language Models exhibit self-preference bias when evaluating other LLMs, systematically favoring outputs from themselves or related models even when using objective rubric-based criteria. The bias can reach 50% on objective benchmarks and 10-point score differences on subjective medical benchmarks, potentially distorting model rankings and hindering AI development.