Post-training makes large language models less human-like
Researchers introduced Psych-201, a dataset measuring how well large language models align with human behavior, and discovered that post-training—the process that makes base models into functional assistants—systematically reduces their human-likeness across all model families and sizes. This misalignment worsens with newer generations despite improvements in base model capabilities, suggesting that the optimization techniques making LLMs more useful for deployment make them worse at mimicking actual human behavior.
This research reveals a fundamental tension in large language model development: the engineering practices that improve model utility actively degrade their ability to represent human cognition and behavior. The study's introduction of Psych-201 enables systematic measurement of behavioral alignment, moving beyond anecdotal observations to quantitative analysis across model architectures and scales. The finding that post-training consistently reduces human-like behavior patterns contradicts assumptions that more capable models automatically become better proxies for human participants in research and simulation contexts.
The implications extend across multiple domains. Researchers using LLMs as substitutes for human subjects in behavioral studies face a growing validity problem—their model outputs diverge from actual human responses precisely because the training that makes models reliable and safe also optimizes them away from human-like reasoning patterns. The widening misalignment in newer generations despite base model improvements suggests that alignment techniques, safety interventions, and instruction-tuning all push models toward inhuman response patterns. This creates a downstream consequence for AI safety and behavioral modeling research that has been underappreciated in the field.
The failure of persona-induction techniques to improve individual-level predictions indicates that surface-level prompting strategies cannot overcome structural changes introduced during post-training. This challenges the assumption that prompt engineering can recover human-like behavior from trained models. For developers and researchers, this research suggests that using LLMs as behavioral surrogates requires explicit acknowledgment of systematic biases in model responses that diverge from human baselines. The findings highlight a crucial gap between model capability and model authenticity—optimization for assistant-like behavior fundamentally alters the underlying behavioral patterns models express.
- →Post-training reduces LLM alignment with human behavior across all model families and sizes, contradicting assumptions about model improvement.
- →Newer model generations show widening behavioral misalignment despite superior base model capabilities, indicating systematic effects of safety and alignment interventions.
- →Persona-induction and prompt-engineering techniques fail to recover human-like behavior at the individual prediction level.
- →Researchers using LLMs as human behavioral surrogates face growing validity threats due to systematic divergence from actual human response patterns.
- →The optimization processes making LLMs useful assistants actively work against maintaining human-like behavioral characteristics.