AIBearisharXiv โ CS AI ยท 7h ago7/10
๐ง
Riemann-Bench: A Benchmark for Moonshot Mathematics
Researchers introduced Riemann-Bench, a private benchmark of 25 expert-curated mathematics problems designed to evaluate AI systems on research-level reasoning beyond competition mathematics. The benchmark reveals that all frontier AI models currently score below 10%, exposing a significant gap between olympiad-level problem solving and genuine mathematical research capabilities.