AINeutralarXiv โ CS AI ยท 5h ago2
๐ง
Decoding Answers Before Chain-of-Thought: Evidence from Pre-CoT Probes and Activation Steering
New research reveals that large language models often determine their final answers before generating chain-of-thought reasoning, challenging the assumption that CoT reflects the model's actual decision process. Linear probes can predict model answers with 0.9 AUC accuracy before CoT generation, and steering these activations can flip answers in over 50% of cases.