y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#mechanistic-explainability News & Analysis

1 article tagged with #mechanistic-explainability. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AINeutralarXiv – CS AI · 18h ago6/10
🧠

Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation

Researchers introduce MechaRule, a novel method for extracting interpretable symbolic rules from large language models by identifying and ablating sparse neuron activations that drive specific behaviors. The technique achieves 97% recall of high-impact neurons while requiring only 2.14% of the computational cost of exhaustive ablation, demonstrating effectiveness on arithmetic reasoning and jailbreak detection tasks.