#mechanistic-ai News & Analysis

5 articles tagged with #mechanistic-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles

AINeutralarXiv – CS AI · May 127/10

🧠

Data-driven Circuit Discovery for Interpretability of Language Models

Researchers introduce Data-driven Circuit Discovery (DCD), a new framework for understanding language models that challenges the assumption that models implement tasks using a single computational circuit. By clustering data based on how models process examples, DCD discovers multiple task-specific circuits per dataset, revealing that existing methods conflate distinct mechanisms into single circuits and produce dataset-dependent rather than generalizable interpretations.

AIBullisharXiv – CS AI · May 127/10

🧠

Do LLMs Experience an Internal Polylogue? Investigating Reasoning through the Lens of Personas

Researchers demonstrate that large language models encode behavioral traits as linear directions in activation space called "persona vectors," which can be monitored and manipulated during reasoning. By treating these vectors as dynamic signals over generation time—termed "polylogue"—they achieve competitive accuracy prediction on MMLU-Pro while enabling stage-aware latent steering that improves model performance.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

Researchers have developed sparse autoencoders to interpret and control how language models process text-to-speech synthesis in CosyVoice3. The work demonstrates that interpretable features—phonemes, laughter, accent, and speaker gender—are causally linked to speech output and can be precisely steered to modify synthesis behavior without retraining.

AINeutralarXiv – CS AI · May 286/10

🧠

Geometry of Human Perceptual Domains Emerges Transiently in LLM Representations

Researchers discovered that large language models develop geometric structures in their internal representations that mirror human perceptual organization across domains like color, pitch, and emotion, despite training only on text. These perceptual geometries emerge transiently in intermediate layers, providing new insight into how LLMs develop human-like conceptual understanding without direct sensory supervision.

AINeutralarXiv – CS AI · Apr 156/10

🧠

FaCT: Faithful Concept Traces for Explaining Neural Network Decisions

Researchers introduce FaCT, a new approach for explaining neural network decisions through faithful concept-based explanations that don't rely on restrictive assumptions about how models learn. The method includes a new evaluation metric (C²-Score) and demonstrates improved interpretability while maintaining competitive performance on ImageNet.