#attention-heads News & Analysis

10 articles tagged with #attention-heads. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles

AIBullisharXiv – CS AI · 2d ago7/10

🧠

Multimodal Function Vectors for Visual Relations

Researchers demonstrate that Large Multimodal Models encode visual relational knowledge in specific attention heads called function vectors, which can be extracted and manipulated to improve performance on relational tasks. These vectors can be fine-tuned with minimal data while keeping model parameters frozen, and can be linearly combined to solve novel analogy problems, advancing understanding of how multimodal AI systems process visual relationships.

AINeutralarXiv – CS AI · Apr 157/10

🧠

Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training

Researchers demonstrate that post-training in reasoning models creates specialized attention heads that enable complex problem-solving, but this capability introduces trade-offs where sophisticated reasoning can degrade performance on simpler tasks. Different training methods—SFT, distillation, and GRPO—produce fundamentally different architectural mechanisms, revealing tensions between reasoning capability and computational reliability.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Steering at the Source: Style Modulation Heads for Robust Persona Control

Researchers have identified a method to control Large Language Model behavior by targeting only three specific attention heads called 'Style Modulation Heads' rather than the entire residual stream. This approach maintains model coherency while enabling precise persona and style control, offering a more efficient alternative to fine-tuning.

AINeutralarXiv – CS AI · Mar 57/10

🧠

Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition

Researchers studied how large language models generalize to new tasks through "off-by-one addition" experiments, discovering a "function induction" mechanism that operates at higher abstraction levels than previously known induction heads. The study reveals that multiple attention heads work in parallel to enable task-level generalization, with this mechanism being reusable across various synthetic and algorithmic tasks.

AIBullisharXiv – CS AI · Feb 277/105

🧠

Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP

Researchers developed Dyslexify, a training-free defense mechanism against typographic attacks on CLIP vision models that inject malicious text into images. The method selectively disables attention heads responsible for text processing, improving robustness by up to 22% while maintaining 99% of standard performance.

AINeutralarXiv – CS AI · 2d ago6/10

🧠

When Do Attention Circuits Form? Developmental Trajectories of Capability and Attention-Sink Emergence Across Three 1B-ClassArchitectures

Researchers tracked how attention-head circuits form during training across three 1B-parameter language models, revealing that induction circuits and attention-sink circuits emerge as separate phenomena separated by an order of magnitude in training tokens. The study identifies architectural properties (zero BOS-heads in early layers) and demonstrates that circuit identification requires only 0.3-2% of total training data, offering insights into mechanistic interpretability of transformer models.

AINeutralarXiv – CS AI · May 286/10

🧠

Cultural Binding Heads in Language Models

Researchers identify specific attention heads in large language models responsible for cultural binding—associating cultural items with appropriate identities. Through mechanistic interpretability analysis, they find that steering these heads can improve cultural differentiation accuracy by 1-3 percentage points, revealing that models possess far more cultural knowledge than they actively use.

AINeutralarXiv – CS AI · May 116/10

🧠

Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions

Researchers investigated how language models develop internal representations of future constraints during text generation using rhyming-couplet completion as a test case. Across three major model families (Qwen, Gemma, Llama), only Gemma-3-27B demonstrated causal reliance on future-planning representations, with a critical handoff point at layer 30 localized to five attention heads.

🧠 Llama

AINeutralarXiv – CS AI · Apr 206/10

🧠

Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

Researchers identify specific attention heads in vision-language models that cause prompt-induced hallucinations, where models favor textual instructions over visual evidence. By ablating these identified heads, they reduce hallucinations by 40% without retraining, revealing model-specific mechanisms underlying this failure mode.

AINeutralarXiv – CS AI · Mar 126/10

🧠

Causally Grounded Mechanistic Interpretability for LLMs with Faithful Natural-Language Explanations

Researchers developed a pipeline to translate AI model internal mechanisms into human-understandable explanations, testing on GPT-2 Small. The study identified six attention heads responsible for 61.4% of model performance on a specific task, with LLM-generated explanations outperforming template-based approaches by 64%.