DISCO: Mitigating Bias in Deep Learning with Conditional Distance Correlation
Researchers introduce DISCO, a machine learning framework that uses conditional distance correlation to mitigate dataset bias in deep learning models. By grounding the approach in causal theory through the Standard Anti-Causal Model (SAM), the method achieves competitive performance across multiple datasets while requiring fewer hyperparameters than existing bias mitigation techniques.
This research addresses a fundamental challenge in machine learning: dataset bias that causes models to learn spurious correlations rather than genuine task-relevant patterns. The introduction of the Standard Anti-Causal Model provides a theoretical foundation for understanding how bias mechanisms operate, establishing conditional independence as a criterion for causal stability. This theoretical grounding distinguishes DISCO from purely empirical approaches to bias mitigation.
The practical contribution centers on two efficient estimators—DISCO_m and sDISCO—that implement conditional distance correlation regularization within gradient-based optimization. The ability to scale seamlessly across multi-bias scenarios addresses a key limitation of prior methods that often struggle when multiple bias sources interact. Testing across six diverse datasets demonstrates consistent competitive performance, suggesting the approach generalizes effectively rather than overfitting to specific dataset characteristics.
For the machine learning and AI development community, this work offers both theoretical clarity and practical tools. Practitioners gain methods requiring fewer hyperparameter tuning iterations, reducing computational overhead during model development. The open-source release enables rapid adoption and validation across different domains. The integration of causal reasoning with deep learning represents a meaningful advancement in building more robust and interpretable models.
Future developments will likely focus on extending these methods to larger-scale models, applying conditional distance correlation to other model architectures beyond standard neural networks, and validating performance on domain-specific applications where bias mitigation is critical such as healthcare, criminal justice, and hiring systems.
- →DISCO introduces conditional distance correlation regularization for mitigating dataset bias in deep learning with theoretical grounding in causal frameworks.
- →The Standard Anti-Causal Model (SAM) provides a unifying theoretical foundation for characterizing bias mechanisms and establishing causal stability criteria.
- →Methods achieve competitive performance across six datasets while requiring fewer hyperparameters than existing bias mitigation approaches.
- →Approach scales efficiently to multi-bias scenarios where multiple bias sources interact simultaneously.
- →Open-source implementation enables rapid adoption and integration into existing machine learning pipelines.