#model-circuits News & Analysis

2 articles tagged with #model-circuits. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBearisharXiv – CS AI · May 297/10

🧠

When and How Long? The Readout-Mediator Angle in Temporal Reasoning

Researchers demonstrate that linear probes can successfully decode information from neural networks while remaining completely disconnected from how models actually process that information. Using calendar-date reasoning tasks, they show that probes identifying day-of-year information are orthogonal to the causal mechanisms models use for duration reasoning, revealing a fundamental flaw in probe-based interpretability methods.

AINeutralarXiv – CS AI · May 296/10

🧠

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

Researchers demonstrate that reinforcement learning (RL) preserves internal computational circuits in large language models better than supervised fine-tuning (SFT) during task adaptation. Using a new metric called differential circuit vulnerability on Qwen2.5-3B-Instruct, they reveal a mechanistic trade-off: SFT adapts faster but causes substantial circuit disruption and capability forgetting, while RL maintains base model circuits at the cost of slower learning.