AINeutralarXiv – CS AI · 9h ago6/10
🧠
AlgoVeri: An Aligned Benchmark for Verified Code Generation on Classical Algorithms
Researchers introduced AlgoVeri, a unified benchmark for evaluating AI-generated formally verified code across three major verification systems (Dafny, Verus, and Lean). The benchmark reveals significant performance disparities depending on the verification language, with frontier AI models achieving 40.3% success in Dafny but only 7.8% in Lean, highlighting fundamental challenges in cross-paradigm code verification.
🧠 Gemini