AIBearisharXiv โ CS AI ยท 4h ago7/10
๐ง
Is Vibe Coding the Future? An Empirical Assessment of LLM Generated Codes for Construction Safety
Researchers empirically evaluated 450 LLM-generated Python scripts for construction safety and found alarming reliability gaps, including a 45% silent failure rate where code executes but produces mathematically incorrect safety outputs. The study demonstrates that current frontier LLMs lack the deterministic rigor required for autonomous safety-critical engineering applications, necessitating human oversight and governance frameworks.
๐ง GPT-4๐ง Claude๐ง Gemini