TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models
Researchers introduce TSFMAudit, the first systematic method for detecting data contamination in time series foundation models (TSFMs) pretrained on large datasets. The approach identifies contamination by analyzing how quickly models adapt to evaluation data, with contaminated datasets showing unusually efficient loss reduction and minimal backbone movement during fine-tuning.
Time series foundation models represent a critical emerging category in machine learning, with widespread pretraining on massive datasets creating a significant audit challenge. When evaluation datasets overlap with pretraining corpora, performance metrics become artificially inflated, misleading researchers and practitioners about true model capabilities. TSFMAudit addresses this transparency gap by introducing the first contamination detection framework specifically designed for time series data, moving beyond adaptations of large language model auditing techniques.
The research stems from a fundamental problem in machine learning reproducibility. Unlike text-based foundation models where corpus documentation exists, time series data presents unique obstacles: continuous signals, heterogeneous sources, and sparse metadata make contamination detection extraordinarily difficult. Traditional string matching approaches fail entirely when dealing with temporal data. The probe adaptation dynamics method cleverly exploits the signature of contamination—contaminated models show disproportionately fast learning with minimal weight adjustments, a pattern invisible in standard evaluation metrics.
For the broader AI industry, this work establishes critical infrastructure for credible benchmarking. Developers relying on inflated performance claims risk building systems on unrealistic expectations. The evaluation against 187 datasets across 6 different TSFMs demonstrates practical applicability. This standardization improves scientific integrity while protecting downstream applications from unreliable model selections.
The implications extend to financial forecasting systems, energy prediction models, and other critical applications depending on time series accuracy. As TSFMs proliferate in production environments, reliable contamination auditing becomes essential risk management. Future work likely includes automated auditing pipelines and industry-wide adoption protocols.
- →TSFMAudit is the first contamination auditing method specifically designed for time series foundation models, addressing a previously unexamined verification gap.
- →The method detects contamination through probe adaptation dynamics, identifying unusually efficient loss reduction and minimal model weight changes as contamination signatures.
- →Evaluation across 187 datasets and 6 TSFMs demonstrates practical viability with documented training sources providing ground truth supervision.
- →Accurate contamination detection is critical for reliable benchmarking in financial forecasting, energy prediction, and other production applications.
- →This framework establishes credible auditing infrastructure to prevent inflated performance claims from misleading downstream model selection and deployment decisions.