y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#neural-circuits News & Analysis

2 articles tagged with #neural-circuits. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AINeutralarXiv โ€“ CS AI ยท Apr 77/10
๐Ÿง 

How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models

Researchers identified a sparse routing mechanism in alignment-trained language models where gate attention heads detect content and trigger amplifier heads that boost refusal signals. The study analyzed 9 models from 6 labs and found this routing mechanism distributes at scale while remaining controllable through signal modulation.

AIBullisharXiv โ€“ CS AI ยท Mar 167/10
๐Ÿง 

Disentangling Recall and Reasoning in Transformer Models through Layer-wise Attention and Activation Analysis

Researchers used mechanistic interpretability techniques to demonstrate that transformer language models have distinct but interacting neural circuits for recall (retrieving memorized facts) and reasoning (multi-step inference). Through controlled experiments on Qwen and LLaMA models, they showed that disabling specific circuits can selectively impair one ability while leaving the other intact.