AINeutralarXiv โ CS AI ยท Mar 56/10
๐ง
Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations
Research reveals that Large Language Models show varying vulnerabilities to different types of Chain-of-Thought reasoning perturbations, with math errors causing 50-60% accuracy loss in small models while unit conversion issues remain challenging even for the largest models. The study tested 13 models across parameter ranges from 3B to 1.5T parameters, finding that scaling provides protection against some perturbations but limited defense against dimensional reasoning tasks.