AINeutralarXiv – CS AI · 3h ago6/10
🧠
Risk-Controlled Lean-as-Judge for Natural-Language Mathematical Reasoning
Researchers demonstrate that Lean formal proof verification produces unreliable signals for validating natural-language mathematical reasoning, with accuracy varying from 96% at high coverage to 20% at low coverage. They introduce COVCAL, a risk-control method that certifies when partial formal signals can be trusted, showing that feasibility depends critically on autoformalization quality and coverage rates.