Alignment Makes Language Models Normative, Not Descriptive
Research comparing 120 base and aligned language model pairs reveals that alignment training makes models more normative but less descriptive of actual human behavior. Base models predict real human choices in multi-round strategic games 10 times better, while aligned models excel only in single-shot, textbook scenarios where human behavior follows rational expectations.