y0news
#linear-probes1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 5h ago2
๐Ÿง 

Decoding Answers Before Chain-of-Thought: Evidence from Pre-CoT Probes and Activation Steering

New research reveals that large language models often determine their final answers before generating chain-of-thought reasoning, challenging the assumption that CoT reflects the model's actual decision process. Linear probes can predict model answers with 0.9 AUC accuracy before CoT generation, and steering these activations can flip answers in over 50% of cases.