AINeutralarXiv โ CS AI ยท Feb 277/107
๐ง
LeanCat: A Benchmark Suite for Formal Category Theory in Lean (Part I: 1-Categories)
Researchers introduced LeanCat, a benchmark comprising 100 category-theory tasks in Lean to test AI's formal theorem proving capabilities. State-of-the-art models achieved only 12% success rates, revealing significant limitations in abstract mathematical reasoning, while a new retrieval-augmented approach doubled performance to 24%.