An Infectious Disease Spread Simulation Based on Large Language Model Decision Making
Researchers developed an agent-based simulation framework using large language models to model individual decision-making during infectious disease outbreaks, integrating LLM-generated behavioral choices into spatially-grounded synthetic populations across real cities. The study found that income and education are the primary factors determining disease reporting rates, with geography and message framing playing secondary roles in shaping public health responses.
This research represents an important intersection of computational epidemiology and AI capability demonstration. By leveraging LLMs to simulate realistic human behavioral responses to disease outbreaks, the study addresses a critical gap in public health modeling—moving beyond aggregate population statistics to capture individual decision-making heterogeneity grounded in actual demographic and geographic data.
The work builds on established research showing LLMs can replicate human behavior patterns, but advances it meaningfully by introducing spatial dimensions using census data from San Francisco and Atlanta. This geographic grounding transforms abstract behavioral simulation into policy-relevant analysis that reflects real-world inequality patterns. Testing three decision scenarios—independent reasoning, household influence, and message framing—provides practical insights into how public health communications might influence reporting behaviors differently across populations.
The finding that income and education dominate reporting variation carries significant implications for health equity and outbreak response planning. It suggests that uniform public health messaging may systematically undercount disease prevalence in lower-income communities, potentially leading to underestimation of outbreak severity where vulnerability is highest. This insight enables more targeted intervention design.
For epidemiologists and public health planners, this framework offers a tool to stress-test interventions before deployment and identify vulnerable populations likely to be missed by standard reporting mechanisms. The methodology could extend beyond influenza to model behavioral responses across diverse health crises. Future applications might incorporate real-time reporting data to validate LLM-generated simulations or explore how misinformation affects compliance with public health guidance.
- →LLMs can simulate realistic disease-reporting decisions when grounded in demographic and geographic context data.
- →Income and education emerge as dominant factors explaining why some populations underreport illness symptoms.
- →Spatial heterogeneity in synthetic populations enables bias-aware epidemiological modeling aligned with real-world inequality patterns.
- →Message framing and household influence show measurable but secondary effects on self-reported illness outcomes.
- →Framework demonstrates potential for testing public health interventions before implementation to identify equity gaps.