Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation
Pythagoras-Prover introduces a family of efficient Lean theorem provers that achieve state-of-the-art performance with significantly fewer parameters than existing models, using novel training techniques including curriculum learning and augmented data generation. The 4B-parameter model outperforms DeepSeek-Prover-V2-671B by 167x parameter efficiency, while the 32B model sets new benchmarks on formal mathematics tasks.
Pythagoras-Prover represents a meaningful advancement in formal verification technology by demonstrating that compute efficiency and mathematical reasoning capability need not be mutually exclusive. The research addresses a critical bottleneck in AI-assisted theorem proving: the computational expense required for both training and inference on formal proof tasks. By achieving superior performance with a 4B model compared to a 671B baseline, the work suggests that architectural design and training methodology matter substantially more than raw parameter count in this domain.
The technical innovations deserve attention from the broader AI community. Augmented Lean Formalisation (ALF) tackles data scarcity by generating synthetic variants of verified proofs while preserving formal correctness—a clever approach to expanding training signal without manual verification overhead. The curriculum learning strategy progresses from simple to complex proofs, mirroring human mathematical education and improving sample efficiency. These techniques have potential applicability beyond theorem proving to other domains facing scarce labeled data.
For the formal mathematics and verification community, this work lowers barriers to entry for organizations with constrained computational resources. The open-source release of the models and the new MiniF2F-ALF benchmark enables broader experimentation and iteration. The 93% performance on MiniF2F-Test and novel benchmark results establish credible baselines for future research. However, the practical impact remains confined to academic and specialized verification use cases rather than mainstream applications. Real-world adoption of automated theorem proving continues to face domain-specific challenges beyond algorithmic improvements.
- →Pythagoras-Prover-4B outperforms DeepSeek-Prover-V2-671B at pass@32 on MiniF2F-Test despite having 167x fewer parameters
- →Augmented Lean Formalisation generates synthetic proof variants to expand training data without requiring formal re-verification
- →Curriculum learning strategy progressively trains models from simpler to more complex proofs for improved sample efficiency
- →The 32B model achieves 93% accuracy on MiniF2F-Test and solves 93 of 672 PutnamBench problems as open-source SOTA
- →Dynamic proof-reasoning filtering maintains informative training signals while constraining context to 8k tokens