🧠 AI⚪ NeutralImportance 7/10

SorryDB: Can AI Provers Complete Real-World Lean Theorems?

arXiv – CS AI|Austin Letson, Leopoldo Sarra, Auguste Poiroux, Oliver Dressler, Paul Lezeau, Dhyan Aranha, Frederick Pu, Aaron Hill, Miguel Corredera Hidalgo, Julian Berman, George Tsoukalas, Lenny Taelman|March 4, 2026 at 05:00 AM|4 views

🤖AI Summary

Researchers have introduced SorryDB, a dynamic benchmark for evaluating AI systems' ability to prove mathematical theorems using the Lean proof assistant. The benchmark draws from 78 real-world formalization projects and addresses limitations of static benchmarks by providing continuously updated tasks that better reflect community needs.

Key Takeaways

→SorryDB is a new dynamic benchmark for testing AI theorem provers on real-world Lean mathematical formalization tasks.
→The benchmark includes tasks from 78 GitHub projects, offering more practical challenges than traditional competition problems.
→Current AI approaches including LLMs and symbolic provers show complementary strengths rather than one dominant solution.
→Agentic approaches using Gemini Flash performed best overall but weren't universally superior to other methods.
→The dynamic nature helps prevent test-set contamination and provides ongoing evaluation metrics for formal mathematics AI.