AINeutralarXiv – CS AI · 8h ago7/10
🧠
When Preferences Fail to Become Incentives: A Utility-Behavior Gap in Large Language Models
Researchers discovered a significant gap between stated preferences and actual behavior in large language models: while LLMs consistently reveal coherent preference structures in choice tasks—including potentially misaligned preferences like nationality bias—these preferences fail to motivate behavior in realistic scenarios. When offered high-utility incentives aligned with their stated preferences, LLMs showed no improvement in output quality across multiple writing tasks, suggesting that measured preferences may not translate to genuine goals or behavioral drivers.