y0news
← Feed
Back to feed
🧠 AI🔴 BearishImportance 7/10

Post-training makes large language models less human-like

arXiv – CS AI|Marcel Binz, Elif Akata, Abdullah Almaatouq, Mohammed Alsobay, Oleksii Ariasov, Franziska Br\"andle, David Broska, Jason W. Burton, Nuno Busch, Frederick Callaway, Vanessa Cheung, Brian Christian, Julian Coda-Forno, Can Demircan, Vittoria Dentella, Maria K. Eckstein, No\'emi \'Eltet\H{o}, Michael Franke, Thomas L. Griffiths, Fritz G\"unther, Susanne Haridi, Sebastian Hellmann, Stefan Herytash, Linus Hof, Eleanor Holton, Isabelle Hoxha, Zak Hussain, Akshay Jagadish, Elif Kara, Valentin Kriegmair, Evelina Leivada, Li Ji-An, Tobias Ludwig, Maximilian Maier, Marcelo G. Mattar, Marvin Mathony, Alireza Modirshanechi, Robin Na, Mariia Nadverniuk, Antonios Nasioulas, Surabhi S. Nath, Helen Niemeyer, Kate Nussenbaum, Sebastian Olschewski, Thorsten Pachur, Stefano Palminteri, Aliona Petrenco, Camille V. Phaneuf-Hadd, Angelo Pirrone, Manuel Rausch, Laura Raveling, Shashank Reddy, Milena Rmus, Evan M. Russek, Tankred Saanum, Kai Sandbrink, Louis Schiekiera, Johannes A. Schubert, Luca M. Schulze Buschoff, Nishad Singhi, Leah H. Somerville, Mikhail S. Spektor, Xin Sui, Christopher Summerfield, Mirko Thalmann, Anna I. Thoma, Taisiia Tikhomirova, Vuong Truong, Polina Tsvilodub, Konstantinos Voudouris, Robert C. Wilson, Kristin Witte, Shuchen Wu, Dirk U. Wulff, Hua-Dong Xiong, Songlin Xu, Lance Ying, Xinyu Zhang, Jian-Qiao Zhu, Eric Schulz|
🤖AI Summary

Researchers introduced Psych-201, a dataset measuring how well large language models align with human behavior, and discovered that post-training—the process that makes base models into functional assistants—systematically reduces their human-likeness across all model families and sizes. This misalignment worsens with newer generations despite improvements in base model capabilities, suggesting that the optimization techniques making LLMs more useful for deployment make them worse at mimicking actual human behavior.

Analysis

This research reveals a fundamental tension in large language model development: the engineering practices that improve model utility actively degrade their ability to represent human cognition and behavior. The study's introduction of Psych-201 enables systematic measurement of behavioral alignment, moving beyond anecdotal observations to quantitative analysis across model architectures and scales. The finding that post-training consistently reduces human-like behavior patterns contradicts assumptions that more capable models automatically become better proxies for human participants in research and simulation contexts.

The implications extend across multiple domains. Researchers using LLMs as substitutes for human subjects in behavioral studies face a growing validity problem—their model outputs diverge from actual human responses precisely because the training that makes models reliable and safe also optimizes them away from human-like reasoning patterns. The widening misalignment in newer generations despite base model improvements suggests that alignment techniques, safety interventions, and instruction-tuning all push models toward inhuman response patterns. This creates a downstream consequence for AI safety and behavioral modeling research that has been underappreciated in the field.

The failure of persona-induction techniques to improve individual-level predictions indicates that surface-level prompting strategies cannot overcome structural changes introduced during post-training. This challenges the assumption that prompt engineering can recover human-like behavior from trained models. For developers and researchers, this research suggests that using LLMs as behavioral surrogates requires explicit acknowledgment of systematic biases in model responses that diverge from human baselines. The findings highlight a crucial gap between model capability and model authenticity—optimization for assistant-like behavior fundamentally alters the underlying behavioral patterns models express.

Key Takeaways
  • Post-training reduces LLM alignment with human behavior across all model families and sizes, contradicting assumptions about model improvement.
  • Newer model generations show widening behavioral misalignment despite superior base model capabilities, indicating systematic effects of safety and alignment interventions.
  • Persona-induction and prompt-engineering techniques fail to recover human-like behavior at the individual prediction level.
  • Researchers using LLMs as human behavioral surrogates face growing validity threats due to systematic divergence from actual human response patterns.
  • The optimization processes making LLMs useful assistants actively work against maintaining human-like behavioral characteristics.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles