Expected Value Alignment for Generative Reward Modeling in Formal Mathematics Verification
Researchers introduce Expected Value Alignment (EVA), a novel reward-modeling technique that enables Large Language Models to provide continuous numerical scores while maintaining human-readable text output for formal mathematics verification in Lean 4. The method bridges a critical gap between discrete generative outputs and continuous value assessment needed for reinforcement learning in theorem proving systems.