AINeutralarXiv – CS AI · 3h ago6/10
🧠
OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization Agents
Researchers introduce OR-Space, a comprehensive benchmark for evaluating large language model agents in industrial operations research workflows. Unlike existing benchmarks that focus on single-stage problem translation, OR-Space tests agents across persistent multi-artifact workspaces with three task modes—building optimization models, revising them under changing requirements, and explaining solutions—to assess real-world reliability and practical readiness.