The Good, the Bad, and the Ugly of Markov Boundary for Tabular Prediction
A comprehensive study of Markov boundaries in tabular prediction reveals that while oracle boundaries significantly improve model performance, practical causal discovery methods fail to recover them cost-effectively. The research identifies fundamental misalignments between structural recovery optimization and predictive performance, suggesting that prediction-focused feature selection requires different approaches than theoretical assumptions propose.
This research addresses a critical gap between machine learning theory and practice in tabular prediction. The Markov boundary concept is theoretically elegant—it identifies the minimal feature set necessary for prediction—yet practitioners continue training on full datasets. The SCM3K benchmark, spanning 3,450 tasks with varying feature dimensionality and six different structural causal model families, provides rigorous evidence that the theory-practice divide stems from three concrete problems rather than fundamental flaws in the concept itself.
The findings reveal an asymmetry problem: false negatives and false positives in causal discovery carry sharply different predictive costs, yet standard discovery algorithms optimize structural accuracy rather than prediction accuracy. Modern causal discovery methods exhaust computational budgets before reaching feature regimes where Markov boundaries provide maximum benefit—precisely where sparse, high-dimensional data most needs dimensionality reduction. Additionally, the research demonstrates that the oracle Markov boundary represents just one of many feature sets capable of matching or exceeding full-feature performance.
For the machine learning and causal inference communities, this work reframes feature selection as fundamentally prediction-aligned rather than structure-aligned. It suggests that developing discovery methods optimized for predictive rather than structural fidelity could unlock significant gains in tabular modeling. The implications extend to AutoML systems and data scientists who must balance theoretical elegance against computational constraints. This research should prompt a reconsideration of how causal discovery methods are evaluated and how their outputs are integrated into prediction pipelines, moving beyond structural metrics toward predictive utility measures.
- →Oracle Markov boundaries substantially improve prediction accuracy, especially in large sparse feature spaces, but are expensive or impossible to recover in practice.
- →Causal discovery methods optimize structural accuracy rather than predictive performance, creating misalignment between discovery goals and prediction goals.
- →False negatives and false positives in feature selection carry asymmetric predictive costs that standard discovery algorithms do not account for.
- →Multiple distinct feature sets can match or exceed oracle boundary performance, suggesting prediction-focused selection differs fundamentally from finding the true boundary.
- →Prediction-aligned feature selection requires new algorithmic approaches rather than applying standard causal discovery pipelines to tabular modeling.