y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#theorem-proving News & Analysis

12 articles tagged with #theorem-proving. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

12 articles
AIBullisharXiv โ€“ CS AI ยท Mar 97/10
๐Ÿง 

Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity

Researchers propose a new method for training large language models (LLMs) that addresses the diversity loss problem in reinforcement learning approaches. Their technique uses the ฮฑ-divergence family to better balance precision and diversity in reasoning tasks, achieving state-of-the-art performance on theorem-proving benchmarks.

AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

LeanTutor: Towards a Verified AI Mathematical Proof Tutor

Researchers have developed LeanTutor, a proof-of-concept AI system that combines Large Language Models with theorem provers to create a mathematically verified proof tutor. The system features three modules for autoformalization, proof-checking, and natural language feedback, evaluated using PeanoBench, a new dataset of 371 Peano Arithmetic proofs.

AINeutralarXiv โ€“ CS AI ยท Mar 47/104
๐Ÿง 

SorryDB: Can AI Provers Complete Real-World Lean Theorems?

Researchers have introduced SorryDB, a dynamic benchmark for evaluating AI systems' ability to prove mathematical theorems using the Lean proof assistant. The benchmark draws from 78 real-world formalization projects and addresses limitations of static benchmarks by providing continuously updated tasks that better reflect community needs.

AIBullisharXiv โ€“ CS AI ยท Mar 37/103
๐Ÿง 

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

Researchers introduce GAR (Generative Adversarial Reinforcement Learning), a new AI training framework that jointly trains problem generators and solvers in an adversarial loop for formal theorem proving. The method shows significant improvements in mathematical proof capabilities, with models achieving 4.20% average relative improvement on benchmark tests.

AINeutralarXiv โ€“ CS AI ยท Feb 277/107
๐Ÿง 

LeanCat: A Benchmark Suite for Formal Category Theory in Lean (Part I: 1-Categories)

Researchers introduced LeanCat, a benchmark comprising 100 category-theory tasks in Lean to test AI's formal theorem proving capabilities. State-of-the-art models achieved only 12% success rates, revealing significant limitations in abstract mathematical reasoning, while a new retrieval-augmented approach doubled performance to 24%.

AIBullishOpenAI News ยท Feb 27/105
๐Ÿง 

Solving (some) formal math olympiad problems

Researchers have developed a neural theorem prover for Lean that successfully solved challenging high-school mathematics olympiad problems, including those from AMC12, AIME competitions, and two problems adapted from the International Mathematical Olympiad (IMO). This represents a significant advancement in AI's ability to handle formal mathematical reasoning and proof generation.

AINeutralarXiv โ€“ CS AI ยท 6d ago6/10
๐Ÿง 

ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning

Researchers present ProofSketcher, a hybrid system combining large language models with lightweight proof verification to address mathematical reasoning errors in AI-generated proofs. The approach bridges the gap between LLM efficiency and the formal rigor of interactive theorem provers like Lean and Coq, enabling more reliable automated reasoning without requiring full formalization.

$AVAX
AINeutralarXiv โ€“ CS AI ยท Mar 55/10
๐Ÿง 

Mathematicians in the age of AI

A research paper discusses how AI systems are now capable of proving research-level mathematical theorems both formally and informally. The paper advocates for mathematicians to adapt to this technological disruption and consider both the challenges and opportunities it presents for mathematical practice.

AINeutralarXiv โ€“ CS AI ยท Mar 27/1020
๐Ÿง 

LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics

Researchers have developed LemmaBench, a new benchmark for evaluating Large Language Models on research-level mathematics by automatically extracting and rewriting lemmas from arXiv papers. Current state-of-the-art LLMs achieve only 10-15% accuracy on these mathematical theorem proving tasks, revealing a significant gap between AI capabilities and human-level mathematical research.

AIBullishOpenAI News ยท Sep 76/105
๐Ÿง 

Generative language modeling for automated theorem proving

The article discusses the application of generative language models to automated theorem proving, representing an advancement in AI's ability to generate mathematical proofs. This development could enhance AI systems' reasoning capabilities and formal verification processes.

AINeutralOpenAI News ยท Jun 24/106
๐Ÿง 

GamePad: A learning environment for theorem proving

GamePad is introduced as a learning environment specifically designed for theorem proving applications. The platform appears to focus on providing educational tools and resources for mathematical proof development and validation.