Mask2Cause: Causal Discovery via Adjacency Constrained Causal Attention
Researchers introduce Mask2Cause, a deep learning framework that discovers causal relationships in time series data by integrating causal graph extraction directly into the forecasting process. The method achieves state-of-the-art results while reducing model parameters by over 70% compared to existing approaches.
Mask2Cause addresses a fundamental challenge in machine learning: extracting true causal relationships from temporal data without falling into the trap of spurious correlations. Traditional neural network approaches either use component-wise architectures that miss interconnected system dynamics or apply post-hoc graph extraction methods that risk overfitting to false patterns. This research bridges that gap through an end-to-end architecture that recovers causal structures during the forecasting forward pass itself.
The framework's innovation lies in two core mechanisms: an Inverted Variable Embedding that reframes how variables interact, and an Adjacency-Constrained Masked Attention system that directly constrains what relationships the model can learn. By training with both homoscedastic and heteroscedastic objectives, the method captures causal influences not just in average predictions but also in variance patterns, providing richer causal insights.
For practitioners, the implications are substantial. Model parameter reduction of 70% while maintaining accuracy translates to faster inference, lower computational costs, and improved interpretability—critical factors for deploying causal models in production environments. The consistent performance across synthetic chaotic systems and realistic biological simulations suggests genuine robustness rather than benchmark-specific engineering.
This advancement particularly benefits fields requiring interpretability alongside predictive accuracy: climate modeling, financial forecasting, and biological systems where understanding causality matters as much as making predictions. The reduced complexity also enables deployment on resource-constrained devices, expanding accessibility to causal discovery tools beyond well-funded research institutions.
- →Mask2Cause integrates causal graph discovery directly into forecasting rather than applying post-hoc extraction, reducing overfitting risk
- →The framework reduces model parameters by over 70% on average while maintaining predictive accuracy compared to baseline methods
- →Dual training objectives (homoscedastic and heteroscedastic) capture causal influences in both mean and variance of time series data
- →Strong performance across synthetic chaotic systems and biological simulations demonstrates generalization beyond standard benchmarks
- →Lower computational complexity improves interpretability and enables deployment in resource-constrained environments