Semigroup Consistency as a Diagnostic for Learned Physics Simulators
Researchers propose semigroup consistency as a diagnostic tool to evaluate learned physics simulators by checking whether direct evolution and composed evolution produce identical results. Testing on heat and Burgers dynamics shows strong correlation between semigroup error and long-horizon rollout degradation, though using semigroup regularization as a training objective yields mixed results.
Physics simulators trained on neural networks have become increasingly important for scientific computing and engineering applications, but their evaluation methods remain limited. Existing metrics focus on one-step or short-horizon prediction accuracy, which can mask fundamental failures when predictions compound over longer timescales. This research addresses a critical gap by introducing semigroup consistency as a post hoc evaluation metric grounded in mathematical principles of dynamical systems.
The semigroup property stems from autonomous, state-complete systems where the mathematical evolution operator satisfies compositional rules: evolving a system for s+t timesteps should yield identical results to evolving it for s steps, then t steps. This principle is mathematically sound but often violated by learned approximations due to accumulated numerical errors and model limitations. By quantifying normalized semigroup error, researchers can identify when a simulator fails in temporal composition without requiring expensive long-horizon experiments.
The empirical results demonstrate practical value: trajectory-level correlations of Ο=0.635 between semigroup error and actual rollout degradation indicate the metric reliably identifies problematic models. However, the mixed results from semigroup regularization as a training objective suggest the approach works better as a diagnostic tool than as a training constraint. This distinction matters for practitioners who want efficient evaluation without restructuring existing training pipelines.
The work advances the field by providing a mathematically principled, computationally efficient diagnostic that researchers and engineers can apply to any learned physics simulator regardless of architecture. This model-agnostic approach enables better quality assurance in scientific machine learning applications where long-horizon accuracy is critical.
- βSemigroup consistency detects temporal composition failures in learned physics simulators that standard short-horizon metrics miss.
- βStrong correlation (Ο=0.635) between normalized semigroup error and rollout degradation validates the diagnostic's practical utility.
- βThe metric works as a post hoc evaluation tool requiring minimal computational overhead independent of model architecture.
- βSemigroup regularization during training shows mixed effectiveness, limiting its value as a training objective despite theoretical appeal.
- βThis research provides practitioners a mathematically grounded quality assurance method for scientific machine learning applications.