AINeutralarXiv – CS AI · 6h ago6/10
🧠
Evaluating Research-Level Math Proofs via Strict Step-Level Verification
Researchers developed a step-level verification framework that improves Large Language Models' ability to evaluate complex mathematical proofs by maintaining detailed context for each deduction and constraining theorem sources, rather than relying on global evaluation. Testing on research-level proofs revealed that unconstrained approaches fail to catch subtle logical errors, while the new method reveals that remaining verification failures stem from implicit domain conventions rather than hallucinations.