AIBearisharXiv – CS AI · Apr 206/10
🧠
The threat of analytic flexibility in using large language models to simulate human data
A new study reveals that using large language models to generate synthetic datasets ("silicon samples") produces highly variable results depending on configuration choices, with correlation outcomes ranging from r=.23 to r=.84 on the same task. This demonstrates that analytic flexibility in LLM-based data generation poses a significant threat to research validity and reproducibility in social science applications.