AINeutralarXiv – CS AI · 8h ago6/10
🧠
Can Reasoning Models Detect Changes to their Chains of Thought?
Researchers studied whether advanced reasoning models can detect modifications to their chains of thought (CoT), finding that models exhibit only modest detection accuracy and struggle to identify how their reasoning was altered. This suggests that interventions like prefilling reasoning from stronger models or removing unsafe steps may succeed partly because models cannot reliably detect the tampering.