When and How Long? The Readout-Mediator Angle in Temporal Reasoning
Researchers demonstrate that linear probes can successfully decode information from neural networks while remaining completely disconnected from how models actually process that information. Using calendar-date reasoning tasks, they show that probes identifying day-of-year information are orthogonal to the causal mechanisms models use for duration reasoning, revealing a fundamental flaw in probe-based interpretability methods.