AINeutralarXiv โ CS AI ยท Mar 47/104
๐ง
SorryDB: Can AI Provers Complete Real-World Lean Theorems?
Researchers have introduced SorryDB, a dynamic benchmark for evaluating AI systems' ability to prove mathematical theorems using the Lean proof assistant. The benchmark draws from 78 real-world formalization projects and addresses limitations of static benchmarks by providing continuously updated tasks that better reflect community needs.