AINeutralarXiv โ CS AI ยท 5h ago6/10
๐ง
Automatically Generating Hard Math Problems from Hypothesis-Driven Error Analysis
Researchers have developed a new automated pipeline that generates challenging math problems by first identifying specific mathematical concepts where LLMs struggle, then creating targeted problems to test these weaknesses. The method successfully reduced a leading LLM's accuracy from 77% to 45%, demonstrating its effectiveness at creating more rigorous benchmarks.
๐ง Llama