AIBearisharXiv โ CS AI ยท 5h ago
๐ง
When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning
Research reveals that state-of-the-art AI mathematical reasoning models like Qwen2.5-Math-7B achieve 61% accuracy primarily through unreliable computational pathways, with only 18.4% using stable reasoning. The study exposes that 81.6% of correct predictions come from inconsistent methods and 8.8% are confident but incorrect outputs.