KG-SoftMAP: Soft Knowledge-Graph Priors for Bayesian Network Structure Learning from Sparse Discrete Data
KG-SoftMAP is a novel machine learning method that improves Bayesian network structure learning from sparse discrete data by integrating imperfect domain knowledge as weighted soft priors. The approach combines expert-curated or LLM-extracted knowledge graphs with statistical scoring, demonstrating superior structure recovery on synthetic benchmarks and practical utility on real educational datasets.
KG-SoftMAP addresses a fundamental challenge in causal inference: learning directed acyclic graph (DAG) structures when observational data is sparse and incomplete. Traditional structure learning methods fail when variable pairs lack sufficient joint observations for reliable statistical scoring, a common constraint in domains like education where data collection is expensive or incomplete. This research bridges the gap between pure data-driven approaches and human domain expertise by encoding knowledge graphs as confidence-weighted priors that guide but don't dictate structure recovery.
The significance lies in its practical flexibility. The method treats knowledge sources—whether from domain experts or large language models—as soft constraints rather than hard requirements, allowing data to override incorrect prior beliefs. On synthetic benchmarks with ground-truth DAGs, KG-SoftMAP recovers meaningful partial structure even at very sparse observation rates (ρ=0.05), substantially improving upon baselines. Performance scales gracefully with data availability, reaching strong recovery rates (DF1 0.46-0.96) at moderate sparsity levels.
For real-world deployment, the method's value extends beyond pure accuracy. On educational datasets, the learned Bayesian networks function as interpretable diagnostic models, providing calibrated joint probability estimates and enabling inference from arbitrary observation subsets—capabilities that discriminative baselines like logistic regression cannot match. The approach acknowledges its own limitations: when knowledge graphs are unreliable or absent, simpler discriminative methods prove preferable.
This work signals growing maturity in hybrid AI approaches that leverage both human expertise and machine learning. The demonstration of LLM-extracted knowledge graphs as viable prior sources opens pathways for scaling structured learning across domains where ground-truth knowledge exists but is imperfectly articulated.
- →KG-SoftMAP enables Bayesian network learning from sparse data by encoding knowledge graphs as overridable soft priors rather than hard constraints.
- →The method demonstrates 3-7x improvement in directed structure recovery compared to data-only baselines on sparse synthetic benchmarks.
- →LLM-extracted knowledge graphs are viable sources for prior construction, expanding accessibility beyond expert-curated knowledge.
- →Real-world performance on educational data shows the learned networks provide interpretable diagnostics with calibrated probabilities, though they slightly underperform pure discriminative models on classification tasks.
- →Graceful degradation occurs as knowledge graph quality declines, suggesting robustness to imperfect domain knowledge sources.