AINeutralarXiv – CS AI · 15h ago6/10
🧠
ORLoopBench: Solver-in-the-Loop Benchmarks for Self-Correction and Behavioral Rationality in Operations Research
Researchers introduce ORLoopBench, a benchmark suite that evaluates large language models on Operations Research tasks through an iterative solver-in-the-loop process rather than one-shot code generation. The framework enables models to debug infeasible mathematical models by inspecting constraint conflicts and repairing formulations, with an 8B model achieving 95.3% success on LP repair tasks—outperforming frontier APIs at 92.4%.