AIBullisharXiv – CS AI · 9h ago6/10
🧠
Evaluation of LLMs for Mathematical Formalization in Lean
Researchers compared Large Language Models' ability to generate formal mathematical proofs in Lean 4, finding that Gemini 3.1 Pro and Claude Opus 4.7 achieved the highest success rates (92% and 86% respectively), while NVIDIA Nemotron 3 Super and GPT-OSS 120B offered the best cost-efficiency at under $0.01 per correct proof.
🏢 Nvidia🧠 Claude🧠 Opus