Generating Public Health Responses using Survey-Augmented Large Language Models
Researchers investigated whether large language models can generate synthetic survey responses that mimic real population data on health behaviors and vaccination attitudes. While LLMs successfully reproduced demographic distributions and broad vaccination trends across epidemic waves, they failed to capture correlations between factors within individual respondents and remained identifiable as synthetic, suggesting LLM-generated data could support exploratory modeling but requires further validation before replacing human surveys.
This research addresses a critical infrastructure challenge in epidemiological modeling: the high cost and limited scalability of repeated large-scale surveys needed to inform public health decision-making. By testing whether LLMs can generate synthetic survey responses, the researchers explore a potential solution to accelerate vaccine hesitancy research and epidemic preparedness. Using FluPaths longitudinal data and cluster-informed prompting, they evaluated multiple models' ability to reproduce real-world health behavior patterns across different pandemic waves.
The findings reveal both promise and significant limitations. LLMs successfully reproduced aggregate-level distributions of demographics, vaccination beliefs, and risk perceptions—useful for initial exploratory analysis. However, the models struggled with within-respondent correlations, meaning synthetic profiles often paired beliefs and behaviors in unrealistic combinations. This limitation matters because epidemiological models require accurate behavioral coherence to predict actual decision-making patterns.
The results have implications for AI-assisted public health research and agent-based modeling. While synthetic data augmentation could accelerate hypothesis generation and reduce survey costs, treating LLM-generated responses as substitutes for human data would introduce systematic bias into epidemic predictions. The fact that classifiers could distinguish synthetic from real records indicates the generated data lacks sufficient verisimilitude for high-stakes applications.
Future work should focus on improving within-respondent consistency and developing better validation frameworks. Organizations planning to use LLM-augmented data for public health modeling need rigorous comparative studies demonstrating that synthetic augmentation doesn't distort policy-relevant estimates. This research highlights the broader tension in AI: models excel at reproducing aggregate patterns while failing at capturing the complex individual-level coherence required for trustworthy applications in health and safety domains.
- →LLMs successfully reproduce aggregate demographic and vaccination belief distributions but fail to capture realistic correlations between attitudes within individual respondents.
- →Synthetic survey data remained classifiable as artificial, indicating insufficient realism for direct substitution of human surveys in epidemiological modeling.
- →Model performance varied across different epidemic waves, suggesting LLM-generated data reliability depends on temporal context.
- →The research supports LLM-generated data for exploratory augmentation and hypothesis generation rather than as primary data sources.
- →Further methodological improvements and validation protocols are necessary before deploying synthetic survey data in public health decision-making.