AINeutralarXiv – CS AI · Apr 206/10
🧠
Evaluating LLMs as Human Surrogates in Controlled Experiments
Researchers compared large language models with human responses in a behavioral study on accuracy perception, finding that LLMs reproduce directional effects but with inconsistent effect magnitudes across different models. The study reveals that off-the-shelf LLMs can simulate some human belief-updating patterns in controlled experiments but lack reliable human-scale accuracy, establishing clearer boundaries for when synthetic LLM data is appropriate for behavioral research.