12 articles tagged with #theorem-proving. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 97/10
๐ง Researchers propose a new method for training large language models (LLMs) that addresses the diversity loss problem in reinforcement learning approaches. Their technique uses the ฮฑ-divergence family to better balance precision and diversity in reasoning tasks, achieving state-of-the-art performance on theorem-proving benchmarks.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers have developed LeanTutor, a proof-of-concept AI system that combines Large Language Models with theorem provers to create a mathematically verified proof tutor. The system features three modules for autoformalization, proof-checking, and natural language feedback, evaluated using PeanoBench, a new dataset of 371 Peano Arithmetic proofs.
AINeutralarXiv โ CS AI ยท Mar 47/104
๐ง Researchers have introduced SorryDB, a dynamic benchmark for evaluating AI systems' ability to prove mathematical theorems using the Lean proof assistant. The benchmark draws from 78 real-world formalization projects and addresses limitations of static benchmarks by providing continuously updated tasks that better reflect community needs.
AIBullisharXiv โ CS AI ยท Mar 37/103
๐ง Researchers introduce GAR (Generative Adversarial Reinforcement Learning), a new AI training framework that jointly trains problem generators and solvers in an adversarial loop for formal theorem proving. The method shows significant improvements in mathematical proof capabilities, with models achieving 4.20% average relative improvement on benchmark tests.
AINeutralarXiv โ CS AI ยท Feb 277/107
๐ง Researchers introduced LeanCat, a benchmark comprising 100 category-theory tasks in Lean to test AI's formal theorem proving capabilities. State-of-the-art models achieved only 12% success rates, revealing significant limitations in abstract mathematical reasoning, while a new retrieval-augmented approach doubled performance to 24%.
AIBullishOpenAI News ยท Feb 27/105
๐ง Researchers have developed a neural theorem prover for Lean that successfully solved challenging high-school mathematics olympiad problems, including those from AMC12, AIME competitions, and two problems adapted from the International Mathematical Olympiad (IMO). This represents a significant advancement in AI's ability to handle formal mathematical reasoning and proof generation.
AINeutralarXiv โ CS AI ยท 6d ago6/10
๐ง Researchers present ProofSketcher, a hybrid system combining large language models with lightweight proof verification to address mathematical reasoning errors in AI-generated proofs. The approach bridges the gap between LLM efficiency and the formal rigor of interactive theorem provers like Lean and Coq, enabling more reliable automated reasoning without requiring full formalization.
$AVAX
AINeutralarXiv โ CS AI ยท Mar 55/10
๐ง A research paper discusses how AI systems are now capable of proving research-level mathematical theorems both formally and informally. The paper advocates for mathematicians to adapt to this technological disruption and consider both the challenges and opportunities it presents for mathematical practice.
AINeutralarXiv โ CS AI ยท Mar 27/1020
๐ง Researchers have developed LemmaBench, a new benchmark for evaluating Large Language Models on research-level mathematics by automatically extracting and rewriting lemmas from arXiv papers. Current state-of-the-art LLMs achieve only 10-15% accuracy on these mathematical theorem proving tasks, revealing a significant gap between AI capabilities and human-level mathematical research.
AIBullishSynced Review ยท Apr 306/106
๐ง DeepSeek AI has released DeepSeek-Prover-V2, an open-source large language model specifically designed for Lean 4 theorem proving. The model employs recursive proof search methodology and uses DeepSeek-V3 for training data generation with reinforcement learning, achieving top performance results on the MiniF2F benchmark.
AIBullishOpenAI News ยท Sep 76/105
๐ง The article discusses the application of generative language models to automated theorem proving, representing an advancement in AI's ability to generate mathematical proofs. This development could enhance AI systems' reasoning capabilities and formal verification processes.
AINeutralOpenAI News ยท Jun 24/106
๐ง GamePad is introduced as a learning environment specifically designed for theorem proving applications. The platform appears to focus on providing educational tools and resources for mathematical proof development and validation.