y0news
← Feed
Back to feed
🧠 AI🔴 Bearish

Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks

arXiv – CS AI|Danielle S. Fox, Brenda L. Robles, Elizabeth DiPietro Brovey, Christian D. Schunn|
🤖AI Summary

A research study tested 11 AI tools on their ability to classify the cognitive demand of mathematical tasks, finding they achieved only 63% accuracy on average with no tool exceeding 83%. The tools showed systematic bias toward middle-category classifications and struggled with reasoning about underlying cognitive processes versus surface textual features.

Key Takeaways
  • AI tools achieved only 63% average accuracy in classifying cognitive demand of mathematical tasks, with no tool exceeding 83%.
  • Education-specific AI tools performed no better than general-purpose tools like ChatGPT and Claude.
  • All tools exhibited systematic bias toward middle-category levels and struggled with extreme cognitive demand classifications.
  • AI tools overweighted surface textual features rather than understanding underlying cognitive processes.
  • The findings highlight significant limitations in current AI tools for educational applications and teacher workflow integration.
Mentioned in AI
Companies
Perplexity
Models
ChatGPTOpenAI
ClaudeAnthropic
GeminiGoogle
GrokxAI
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles