#lean-4 News & Analysis

10 articles tagged with #lean-4. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles

AIBullisharXiv – CS AI · Jun 257/10

🧠

TheoremGraph: Bridging Formal and Informal Mathematics

Researchers introduce TheoremGraph, a unified dependency graph linking 11.7M informal mathematical statements from arXiv with 388,105 formal Lean 4 declarations through semantic embeddings. The infrastructure bridges the historically fragmented landscape of mathematical knowledge representation, enabling improved discovery and reasoning across both informal academic papers and formally verified mathematics.

🏢 Hugging Face

AIBullisharXiv – CS AI · Jun 27/10

🧠

Expected Value Alignment for Generative Reward Modeling in Formal Mathematics Verification

Researchers introduce Expected Value Alignment (EVA), a novel reward-modeling technique that enables Large Language Models to provide continuous numerical scores while maintaining human-readable text output for formal mathematics verification in Lean 4. The method bridges a critical gap between discrete generative outputs and continuous value assessment needed for reinforcement learning in theorem proving systems.

AIBullisharXiv – CS AI · Jun 27/10

🧠

FVSpec: Real-World Property-Based Tests as Lean Challenges

Researchers have created FVSpec, a benchmark dataset of 9,415 Lean 4 formal specifications derived from 2,772 real-world Python property-based tests, designed to evaluate AI models on automated formal software verification tasks. The work addresses a critical gap in AI-assisted code verification by providing open-source tools and data to advance AI's capability to formally prove software correctness.

AIBullisharXiv – CS AI · May 297/10

🧠

Formalizing Mathematics at Scale

Researchers have developed AutoformBot, a multi-agent AI system that automatically translates informal mathematics textbooks into machine-verified formal proofs in Lean 4. The team successfully formalized 26 open-access textbooks into a library called Atlas containing over 45,000 declarations and 500,000 lines of verified code, demonstrating that large-scale automated mathematics formalization is now economically viable.

AIBullisharXiv – CS AI · May 297/10

🧠

Keep the Proof State Live: Snapshotting for Efficient Tactic Search in Lean 4

Researchers introduce proof-state snapshotting, a technique that accelerates automated theorem proving in Lean 4 by reusing elaborated proof states across parallel search branches instead of reconstructing them. The method achieves 5.6-50x speedups (averaging 14x) on benchmark problems, addressing a critical bottleneck where per-branch overhead from import loading and elaboration consumed over 99% of computation time.

AINeutralarXiv – CS AI · Apr 107/10

🧠

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

Researchers prove mathematically that no continuous input-preprocessing defense can simultaneously maintain utility, preserve model functionality, and guarantee safety against prompt injection attacks in language models with connected prompt spaces. The findings establish a fundamental trilemma showing that defenses must inevitably fail at some threshold inputs, with results verified in Lean 4 and validated empirically across three LLMs.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Hypothesis-Disciplined Multi-Agent Automated Formalization of Asymptotic Statistical Theory

Researchers have developed a multi-agent AI system in Lean 4 that formalizes asymptotic statistical theory, a mathematically complex domain combining convergence statements, functional analysis, and regularity conditions. The hypothesis-disciplined approach ensures every formalization claim is anchored to source mathematics, producing axiom-clean and human-audited proofs for parametric and semi-parametric statistical models.

AIBullisharXiv – CS AI · Jun 56/10

🧠

Evaluation of LLMs for Mathematical Formalization in Lean

Researchers compared Large Language Models' ability to generate formal mathematical proofs in Lean 4, finding that Gemini 3.1 Pro and Claude Opus 4.7 achieved the highest success rates (92% and 86% respectively), while NVIDIA Nemotron 3 Super and GPT-OSS 120B offered the best cost-efficiency at under $0.01 per correct proof.

🏢 Nvidia🧠 Claude🧠 Opus

AIBullisharXiv – CS AI · Jun 56/10

🧠

Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement

Goedel-Architect is a new AI framework for formal theorem proving that uses blueprint generation and refinement to achieve state-of-the-art results on mathematical benchmarks. Built on DeepSeek-V4-Flash, it demonstrates significant improvements in solving complex mathematical problems while maintaining cost efficiency up to 500x lower than comparable solutions.

AINeutralarXiv – CS AI · May 126/10

🧠

FormalRewardBench: A Benchmark for Formal Theorem Proving Reward Models

Researchers introduce FormalRewardBench, the first benchmark for evaluating reward models in formal theorem proving using Lean 4. The benchmark reveals that frontier LLMs like Claude Opus outperform specialized theorem provers at evaluating proof quality, suggesting that theorem proving ability does not transfer to proof evaluation tasks.

🧠 Claude🧠 Opus