Using Zero-Shot LLM-Generated Survey Data for Geographically Explicit Population Synthesis
Researchers evaluated whether zero-shot LLM-generated survey data can supplement traditional population synthesis workflows, using GPT-4 and Gemini to create synthetic health survey records for Colorado and Mississippi. Results show LLMs capture geographic variations reasonably well but with variable-dependent performance, suggesting promise as supplementary rather than replacement data sources.
This research explores a practical intersection between generative AI and demographic data synthesis, addressing a real constraint in population modeling: the cost and time required to conduct comprehensive surveys. The study uses iterative proportional fitting (IPF), a conventional methodology, as the evaluation framework, testing whether LLM-generated synthetic survey responses can feed existing pipelines without major architectural changes. The researchers generated records for two geographically distinct states, deliberately selecting contrasting demographics to test whether models capture meaningful regional differences beyond generic patterns.
The findings reveal nuanced capability boundaries. Both GPT-4 and Gemini successfully differentiated state-level health characteristics, suggesting zero-shot prompting can produce geographically contextual outputs. However, the mixed downstream effects—where IPF sometimes amplified errors while reducing others—indicate the relationship between synthetic data quality and pipeline robustness remains unpredictable. Strong performance on certain variables alongside poor performance on others suggests LLMs may encode certain demographic patterns reliably while hallucinating others.
For practitioners in urban planning, epidemiology, and transportation modeling who rely on synthetic populations, this work maps a cautious pathway toward AI integration. Rather than replacing survey infrastructure, LLMs could accelerate scenarios where survey data is sparse or where rapid prototyping is needed. The variable-dependent results underscore a critical lesson: generative AI outputs require rigorous benchmarking against domain-specific ground truth before integration into production workflows. Researchers and tool developers should watch how validation methodologies evolve and whether hybrid approaches—combining AI generation with targeted survey data—emerge as the practical standard.
- →LLMs generate geographically differentiated synthetic survey data in zero-shot settings, capturing state-level health contrasts between Colorado and Mississippi.
- →Performance varies significantly by variable, with some health metrics aligning well to ground truth while others diverge substantially.
- →Iterative proportional fitting sometimes amplifies LLM-generated errors and sometimes reduces them, indicating unpredictable downstream effects.
- →Census tract-level spatial validation shows reasonable pattern reproduction for variables with stronger alignment to real survey data.
- →LLM-generated survey data shows promise as supplementary input for population synthesis but cannot yet replace traditional survey data sources.