Interpretable Policy Distillation for Power Grid Topology Control
Researchers demonstrate that a deep reinforcement learning policy for power grid control can be compressed into interpretable decision trees and random forests without performance loss. The distilled models outperform the original neural network while remaining transparent and deployable on resource-constrained hardware, though with topology-specific limitations.
This research addresses a critical gap between AI capability and real-world infrastructure deployment. Power grid operators face pressure to adopt autonomous control systems, yet large neural policies present operational risks: they demand substantial computational resources, resist regulatory scrutiny, and provide little insight into decision-making. The study demonstrates that policy distillation—converting a complex model into simpler surrogates—can preserve or enhance performance while enabling human verification.
The findings emerge from Grid2Op benchmarking, an established platform for grid simulation research. Stress-focused training on high-loading states proved key; by concentrating on scenarios where failure costs are highest, the team produced policies optimized for real-world criticality. The PPO teacher achieved strong baselines, but the distilled tree and random forest variants matched or exceeded its rewards while consuming a fraction of computational overhead.
A notable discovery reveals how compression surfaces representational shifts: the neural policy relied on line-loading signals, while the tree pivoted to bus-topology variables. This divergence signals that multiple decision pathways can produce equivalent control outcomes—valuable for validation but concerning for generalization across different grid configurations.
For infrastructure operators and utilities, this work bridges AI adoption barriers. Auditable, lightweight controllers reduce deployment friction and enable faster regulatory approval. However, deterministic rule-based systems introduce risks if topology-specific training doesn't generalize to network reconfigurations or novel stress scenarios. The research suggests distillation works best within bounded operational domains, not as a universal solution.
- →Decision tree and random forest surrogates matched or exceeded the original PPO neural policy in reward and stability metrics while reducing inference costs substantially.
- →Distilled models revealed representational shifts, with trees prioritizing bus-topology over line-loading signals, highlighting multiple valid control strategies.
- →Transparent tree-based policies remain auditable and deployable on constrained hardware, lowering adoption barriers for real-world grid operations.
- →Stress-focused training on critical high-loading states proved essential for producing policies optimized around infrastructure failure scenarios.
- →Topology-specific generalization risks remain; distilled models may not transfer reliably across different grid configurations or unseen network reconfigurations.