🧠 AI⚪ NeutralImportance 7/10

IndustryCode: A Benchmark for Industry Code Generation

arXiv – CS AI|Puyu Zeng, Zhaoxi Wang, Zhixu Duan, Liang Feng, Shaobo Wang, Cunxiang Wang, Jinghang Wang, Bing Zhao, Hu Wei, Linfeng Zhang|April 6, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce IndustryCode, the first comprehensive benchmark for evaluating Large Language Models' code generation capabilities across multiple industrial domains and programming languages. The benchmark includes 579 sub-problems from 125 industrial challenges spanning finance, automation, aerospace, and remote sensing, with the top-performing model Claude 4.5 Opus achieving 68.1% accuracy on sub-problems.

Key Takeaways

→IndustryCode is the first multi-domain, multi-language benchmark for industrial code generation by LLMs.
→The benchmark comprises 579 sub-problems derived from 125 primary industrial challenges across finance, automation, aerospace, and remote sensing.
→It incorporates diverse programming languages including MATLAB, Python, C++, and Stata.
→Claude 4.5 Opus achieved the highest performance with 68.1% accuracy on sub-problems and 42.5% on main problems.
→The benchmark addresses gaps in existing evaluations that are limited to single domains and languages.

Mentioned in AI

Models

ClaudeAnthropic