AIBearisharXiv โ CS AI ยท 5h ago
๐ง
Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks
A research study tested 11 AI tools on their ability to classify the cognitive demand of mathematical tasks, finding they achieved only 63% accuracy on average with no tool exceeding 83%. The tools showed systematic bias toward middle-category classifications and struggled with reasoning about underlying cognitive processes versus surface textual features.
๐ข Perplexity๐ง ChatGPT๐ง Claude