AINeutralarXiv – CS AI · 7h ago6/10
🧠
ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark
Researchers introduce ASyMOB, a 35,368-problem benchmark dataset for evaluating large language models on symbolic mathematics tasks. The dataset uses systematic perturbations to test genuine reasoning rather than pattern memorization, revealing that most models fail under minor problem variations while hybrid LLM-computer algebra system approaches show promise for scientific computing applications.