Rethinking Scientific Modeling: Toward Physically Consistent and Simulation-Executable Programmatic Generation
Researchers propose a framework for generating physically consistent structural engineering code using large language models, introducing CivilInstruct dataset and MBEval benchmark to reduce hallucinations and ensure simulation-ready outputs. The approach combines domain knowledge, constraint-oriented alignment, and verification-driven evaluation to overcome current limitations in automated building modeling.
Large language models have shown promise in automating code generation across various domains, yet their application to safety-critical engineering faces significant obstacles. This research addresses a fundamental gap: LLMs frequently produce non-executable or physically inconsistent code when tasked with structural modeling, where precision directly impacts simulation validity and real-world safety. The proposed framework tackles this through three integrated mechanisms—domain knowledge construction that embeds engineering principles, constraint-oriented model alignment that enforces API compliance and specification adherence, and verification-driven evaluation that validates both executability and structural dynamics consistency.
The introduction of CivilInstruct as a domain-specific dataset represents a methodological advance in constraining LLM behavior toward specialized technical domains. Rather than relying on general-purpose models, the researchers employ two-stage fine-tuning to progressively enforce constraint satisfaction, substantially reducing hallucinated outputs that plague existing approaches. MBEval's closed-loop validation methodology establishes measurable benchmarks for physical consistency, moving beyond surface-level code quality metrics.
This work carries implications for the broader intersection of AI and engineering automation. As infrastructure projects increasingly rely on computational modeling, ensuring that automated code generation produces physically valid simulations becomes critical for adoption. The framework's success in reducing non-conforming outputs could accelerate deployment of AI-assisted modeling tools in civil and structural engineering workflows, lowering costs while maintaining safety standards. The open-source release of code and datasets signals potential for community-driven expansion across other engineering domains requiring similar verification rigor.
- →LLM-generated structural modeling code frequently violates physical constraints and engineering specifications, limiting practical applicability in simulations.
- →A constraint-oriented fine-tuning strategy combined with domain-specific datasets significantly reduces hallucinated and non-executable outputs.
- →Verification-driven evaluation frameworks are essential for validating AI-generated engineering code in safety-critical applications.
- →The CivilInstruct dataset and MBEval benchmark establish new standards for measuring physical consistency in automated code generation.
- →Open-source release enables broader adoption of physics-consistent AI modeling across civil engineering and related disciplines.