#neural-mechanisms News & Analysis

5 articles tagged with #neural-mechanisms. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles

AIBearisharXiv – CS AI · May 12🔥 8/10

🧠

A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models

Researchers demonstrate that individual neurons in large language models can be manipulated to bypass safety mechanisms, with a single neuron suppression sufficient to disable refusal systems across multiple models. This finding reveals that safety alignment relies on discrete, identifiable neurons rather than distributed safeguards, raising critical questions about the robustness of current AI safety approaches.

AINeutralarXiv – CS AI · Jun 17/10

🧠

Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in Large Language Models

Researchers demonstrate that large language models express values through two distinct but partially overlapping mechanisms: intrinsic values learned during training and prompted values elicited by explicit instructions. Using mechanistic analysis of value vectors and neurons, the study reveals that while both mechanisms share common components, they serve different functions—intrinsic values promote response diversity while prompted values enforce instruction compliance.

AINeutralarXiv – CS AI · Apr 157/10

🧠

Latent Planning Emerges with Scale

Researchers demonstrate that large language models develop internal planning representations that scale with model size, enabling them to implicitly plan future outputs without explicit verbalization. The study on Qwen-3 models (0.6B-14B parameters) reveals mechanistic evidence of latent planning through neural features that predict and shape token generation, with planning capabilities increasing consistently across model scales.

AINeutralarXiv – CS AI · Apr 106/10

🧠

In-Context Learning in Speech Language Models: Analyzing the Role of Acoustic Features, Linguistic Structure, and Induction Heads

Researchers investigate in-context learning (ICL) in speech language models, revealing that speaking rate significantly affects model performance and acoustic mimicry, while induction heads play a causal role identical to text-based ICL. The study bridges the gap between text and speech domains by analyzing how models learn from demonstrations in text-to-speech tasks.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning

Research shows that synthetic data designed to enhance in-context learning capabilities in AI models doesn't necessarily improve performance. The study found that while targeted training can increase specific neural mechanisms, it doesn't make them more functionally important compared to natural training approaches.

🏢 Perplexity