AINeutralarXiv – CS AI · 6h ago6/10
🧠
ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics
Researchers introduce ComBench, a new benchmark containing 100 Olympiad-level combinatorics problems designed to evaluate large language models' mathematical reasoning capabilities. The benchmark reveals that even frontier models struggle with combinatorial problems, with the best performance reaching only 65.4%, and identifies that rigorous proof reasoning and constructive problem-solving are distinct capabilities that models handle unevenly.
🧠 GPT-5