y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

arXiv – CS AI|Yuanhe Zhang, Yuekai Sun, Taiji Suzuki, Jason D. Lee, Fanghui Liu|
🤖AI Summary

LeanMarathon introduces a multi-agent system that automates the formalization of research mathematics in Lean, solving long-horizon verification challenges through an evolving blueprint architecture. The system successfully formalized seven theorems across recent research papers spanning four Erdős problems without requiring manual verification shortcuts, demonstrating progress toward reliable AI co-mathematics.

Analysis

LeanMarathon addresses a critical bottleneck in automated mathematical proof generation: the fragility of long-horizon autoformalization. Traditional approaches fail not merely on difficult lemmas but at scale, where statement drift, tangled dependencies, context decay, and cascading errors from local repairs corrupt entire formal developments. This research tackles these systemic issues through architectural innovation rather than brute-force capability increases.

The system's core insight—using an evolving blueprint that simultaneously functions as formal skeleton, proof graph, and system of record—creates a durable framework for multi-agent coordination. By decomposing the problem into construction, auditing, proving, and repair phases, and orchestrating these through adversarial review followed by parallel, CI-gated proof discharge, LeanMarathon transforms brittle monolithic runs into recoverable, distributed transactions. This mirrors successful patterns in software engineering and distributed systems.

The evaluation demonstrates non-trivial capability: formalizing all seven target theorems across three autonomous runs, with 258 lemmas and theorems proven and zero incomplete proofs (no "sorry" statements). The problems selected—Erdős problems #1051, #1196, #164, #1217—represent contemporary research-level mathematics rather than toy examples, suggesting genuine applicability beyond controlled benchmarks.

For the broader AI landscape, LeanMarathon signals that reliable AI systems often require robust operational infrastructure alongside improved base models. The emphasis on durable harnesses and fault-tolerant coordination mechanisms suggests future AI development will increasingly resemble production engineering, with emphasis on reliability, auditability, and composability rather than raw capability.

Key Takeaways
  • LeanMarathon uses a multi-agent blueprint architecture to automate research-level mathematical formalization without manual verification shortcuts.
  • The system successfully formalized seven theorems across four Erdős problems through three autonomous runs, proving 258 lemmas and theorems total.
  • Novel orchestration strategy combines adversarial review for fidelity stabilization with parallel proof discharge in CI-gated rounds to prevent cascading errors.
  • Results demonstrate that reliable AI co-mathematics requires durable operational harnesses alongside stronger underlying provers.
  • Architecture pattern of distributed transactions with local recovery offers lessons for designing robust AI systems beyond mathematical formalization.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles