AIBearisharXiv – CS AI · 9h ago7/10
🧠
Epidemiology of Model Collapse: Modeling Synthetic Data Contamination via Bilayer SIR Dynamics
Researchers propose a bilayer SIR epidemic model to analyze how synthetic data contamination spreads across AI systems when models train on each other's outputs. Through theoretical analysis, simulations, and GPT-2 experiments, they demonstrate that cross-contamination can sustain itself (R₀ > 1) and identify detection-based filtering as the most effective intervention strategy.