Interpretable Machine Learning for Football Performance Analysis: Evidence of Limited Transferability from Elite Leagues to University Competition
Researchers found that machine learning models trained on elite European football leagues lose interpretability and reliability when applied to university-level competition, suggesting that performance insights don't transfer across competition tiers. The study reveals that explanation stability and feature importance hierarchies are domain-dependent, challenging the assumption that ML-derived performance determinants are universally applicable.
This research addresses a critical gap in applied machine learning: the assumption that models trained on high-quality data remain interpretable when deployed in different contexts. The study trained Random Forest and Multilayer Perceptron models on event data from Europe's top five football leagues, then applied them to university football while maintaining identical feature spaces. The results exposed substantial instability in explanations under domain shift, with key performance indicators reordering significantly and explanation methods showing reduced agreement.
The findings reflect broader challenges in machine learning deployment across specialized domains. Elite leagues generate consistent, high-quality data with standardized playing conditions, creating stable patterns that algorithms readily capture. University competition introduces structural variability—different player skill distributions, tactical sophistication, and game dynamics—that fundamentally alters which factors drive performance. The research demonstrates that instability in ML explanations isn't merely a methodological artifact but rather a diagnostic signal of genuine structural differences between domains.
For practitioners developing performance analysis tools, this research suggests that models cannot be naively transferred between competition levels without retraining and revalidation. Organizations developing football analytics platforms must recognize that insights derived from elite data may mislead when applied to lower-tier competitions. The implications extend beyond sports analytics to any domain where performance models must operate across heterogeneous populations or conditions.
Looking forward, this work highlights the importance of domain-aware model development and the need for interpretability methods that explicitly flag reliability degradation under distribution shift. Future research should explore techniques for detecting when learned patterns lose robustness and methods for adapting explanations to new domains.
- →Machine learning models trained on elite football leagues show unstable and unreliable feature importance when applied to university-level competition
- →Performance determinants exhibit domain-dependent interpretability, suggesting that ML insights cannot be universally transferred across competition tiers
- →Explanation instability under domain shift serves as a diagnostic indicator of structural differences rather than a purely methodological limitation
- →Random Forest and Multilayer Perceptron models both demonstrated reduced agreement with elite-league patterns when applied to university data
- →Analytics platforms must retrain and revalidate models for different competition levels rather than relying on transfer learning from elite leagues