#research-mathematics News & Analysis

2 articles tagged with #research-mathematics. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBearisharXiv – CS AI · Jun 257/10

🧠

Failure Modes of Large Language Models on Research-Level Mathematics: A Taxonomy and an Empirical Characterisation

Researchers identify four specific failure modes in large language models attempting research-level mathematics: citation fabrication, premise smuggling, silent problem reformulation, and local-to-global compatibility gaps. Testing reveals that premise smuggling—where models assert unjustified claims as fundamental results—persists even when citations are accurate, suggesting retrieval-augmented generation alone cannot solve LLM reasoning failures.

🧠 Gemini

AIBearisharXiv – CS AI · Apr 107/10

🧠

Riemann-Bench: A Benchmark for Moonshot Mathematics

Researchers introduced Riemann-Bench, a private benchmark of 25 expert-curated mathematics problems designed to evaluate AI systems on research-level reasoning beyond competition mathematics. The benchmark reveals that all frontier AI models currently score below 10%, exposing a significant gap between olympiad-level problem solving and genuine mathematical research capabilities.