y0news
← Feed
Back to feed
🧠 AI🟢 BullishImportance 7/10

TheoremGraph: Bridging Formal and Informal Mathematics

arXiv – CS AI|Simon Kurgan, Evan Wang, Eric Leonen, Sophie Szeto, Luke Alexander, Artemii Remizov, Jarod Alper, Giovanni Inchiostro, Vasily Ilin|
🤖AI Summary

Researchers introduce TheoremGraph, a unified dependency graph linking 11.7M informal mathematical statements from arXiv with 388,105 formal Lean 4 declarations through semantic embeddings. The infrastructure bridges the historically fragmented landscape of mathematical knowledge representation, enabling improved discovery and reasoning across both informal academic papers and formally verified mathematics.

Analysis

TheoremGraph addresses a fundamental infrastructure gap in mathematics: the disconnect between how mathematicians informally document knowledge in papers and how formal verification systems record it. This unification holds significance for AI systems attempting to reason over mathematical content, as it creates a machine-readable map of mathematical dependencies that spans both worlds. The project's scale—analyzing millions of theorem-like statements and recovering millions of dependencies—demonstrates the feasibility of automatically extracting structured mathematical knowledge at scale.

Mathematical knowledge infrastructure has long been siloed. Formal libraries like Lean provide machine-checkable proofs with precise dependencies, but cover limited mathematics. Academic papers reach broader mathematical domains but lack fine-grained, machine-readable structure. TheoremGraph bridges this by parsing arXiv papers, extracting theorem statements, and embedding them into a shared semantic space with formal declarations. The 47,952 validated cross-graph matches represent genuine connections between informal and formal mathematics, with acceptance rates climbing to 87% in high-confidence matches above 0.9 cosine similarity.

For the AI and mathematics communities, this infrastructure enables new capabilities. Retrieval-augmented generation systems can now traverse both formal and informal mathematics when reasoning. The released API and dataset allow researchers to build better mathematical search tools and attribution systems. The project's performance on formal concept retrieval—achieving 0.775 Recall@10 without language model reranking—suggests the graph structure itself contains meaningful signal. This foundation could accelerate the development of AI systems that combine the verification rigor of formal mathematics with the breadth of informal mathematical literature.

Key Takeaways
  • TheoremGraph unifies 11.7M informal theorem statements from arXiv with 388,105 formal Lean 4 declarations through semantic embeddings.
  • The system recovers 18.3M candidate dependencies in informal mathematics and 11.3M typed edges in formal Lean projects.
  • Cross-graph matching between informal and formal mathematics achieves 47,952 validated links with up to 87% LLM-judge acceptance at high confidence tiers.
  • Open-source infrastructure released via API and datasets enables downstream applications in mathematical search, attribution, and retrieval-augmented reasoning.
  • Name-and-signature representation with graph expansion achieves near-competitive formal concept retrieval performance without expensive language model reranking.
Mentioned in AI
Companies
Hugging Face
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles