AINeutralarXiv โ CS AI ยท 9h ago5/10
๐ง
Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?
Research reveals that Large Language Models (GPT-4 and GPT-5) demonstrate better assessment performance on math problems they can solve correctly versus those they cannot. While math problem-solving expertise supports assessment capabilities, step-level error diagnosis remains more challenging than direct problem solving.
๐ง GPT-4๐ง GPT-5