AIBearisharXiv – CS AI · 4h ago7/10
🧠
Failure Modes of Large Language Models on Research-Level Mathematics: A Taxonomy and an Empirical Characterisation
Researchers identify four specific failure modes in large language models attempting research-level mathematics: citation fabrication, premise smuggling, silent problem reformulation, and local-to-global compatibility gaps. Testing reveals that premise smuggling—where models assert unjustified claims as fundamental results—persists even when citations are accurate, suggesting retrieval-augmented generation alone cannot solve LLM reasoning failures.
🧠 Gemini