AINeutralarXiv – CS AI · 7h ago6/10
🧠
Detection vs. Execution: Single-Bucket Probes Miss Half the Mamba-2 State Sink
Researchers demonstrate that single-bucket probes in Mamba-2 language models identify representational signatures but fail to capture complete computational circuits, missing up to half the execution layer. The study reveals that probe-based mechanistic interpretability can conflate detection mechanisms with execution mechanisms, with critical implications for model behavior—ablating identified head groups entirely collapses retrieval accuracy in downstream tasks.