y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#mechanistic-analysis News & Analysis

12 articles tagged with #mechanistic-analysis. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

12 articles
AINeutralarXiv – CS AI · Jun 27/10
🧠

MENTIS: What Belief Changes Under Alignment? Measuring Multi-Scale Latent Torsion in Language Models

Researchers introduce MENTIS, a framework for measuring internal geometric changes in language models during preference alignment training. The study reveals that alignment leaves selective, depth-localized signatures in model computations, with normative concepts showing larger internal reorganization than factual concepts across multiple model architectures.

AIBearisharXiv – CS AI · May 297/10
🧠

When and How Long? The Readout-Mediator Angle in Temporal Reasoning

Researchers demonstrate that linear probes can successfully decode information from neural networks while remaining completely disconnected from how models actually process that information. Using calendar-date reasoning tasks, they show that probes identifying day-of-year information are orthogonal to the causal mechanisms models use for duration reasoning, revealing a fundamental flaw in probe-based interpretability methods.

AINeutralarXiv – CS AI · May 277/10
🧠

Why LLMs Hallucinate on Structured Knowledge: A Mechanistic Analysis of Reasoning over Linearized Representations

Researchers have identified the mechanistic causes of hallucinations in large language models when reasoning over structured knowledge like graphs and tables. The study reveals that hallucinations stem from systematic failures in attention allocation and semantic grounding in feed-forward layers, rather than random errors, with findings applicable across multiple structured knowledge formats.

AIBullisharXiv – CS AI · May 127/10
🧠

When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models

Researchers identify a fundamental geometric flaw in decoder-based Vision-Language Models where visual embeddings become over-aligned with linguistic patterns, causing systematic hallucinations. The study introduces quantitative methods to characterize this bias and proposes training-free and fine-tuning solutions that reduce hallucinations across multiple benchmarks without computational overhead.

AINeutralarXiv – CS AI · Apr 157/10
🧠

Latent Planning Emerges with Scale

Researchers demonstrate that large language models develop internal planning representations that scale with model size, enabling them to implicitly plan future outputs without explicit verbalization. The study on Qwen-3 models (0.6B-14B parameters) reveals mechanistic evidence of latent planning through neural features that predict and shape token generation, with planning capabilities increasing consistently across model scales.

AIBearisharXiv – CS AI · Apr 157/10
🧠

One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness

Researchers demonstrate that instruction-tuned large language models suffer severe performance degradation when subject to simple lexical constraints like banning a single punctuation mark or common word, losing 14-48% of response quality. This fragility stems from a planning failure where models couple task competence to narrow surface-form templates, affecting both open-weight and commercially deployed closed-weight models like GPT-4o-mini.

🧠 GPT-4
AINeutralarXiv – CS AI · Apr 147/10
🧠

METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models

Researchers introduce METER, a benchmark that evaluates Large Language Models' ability to perform contextual causal reasoning across three hierarchical levels within unified settings. The study identifies critical failure modes in LLMs: susceptibility to causally irrelevant information and degraded context faithfulness at higher causal levels.

AINeutralarXiv – CS AI · Jun 26/10
🧠

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

Researchers systematically studied how masking outdated information improves long-horizon search agents' efficiency, finding that benefits follow an inverted-U pattern dependent on model capacity and retriever quality. The effect collapses when models become saturated, revealing that context management success depends on balancing retriever performance with a model's implicit filtering capacity rather than either factor alone.

AINeutralarXiv – CS AI · May 286/10
🧠

The Grammar of Transformers: A Systematic Review of Interpretability Research on Syntactic Knowledge in Language Models

A comprehensive systematic review of 337 studies examines how Transformer-based language models encode syntactic knowledge, finding strong performance on formal syntax but variable results at the syntax-semantics interface. The research reveals that while these models demonstrate non-trivial syntactic abilities through behavioral and mechanistic evidence, understanding the detailed computational mechanisms remains limited due to methodological heterogeneity and heavy concentration on English and BERT-like architectures.

AINeutralarXiv – CS AI · May 276/10
🧠

How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation

Researchers have developed a mechanistic interpretability framework that reverses information flow through Chain-of-Thought prompting to understand how AI models reason. The study reveals CoT functions as a decoding space pruner that uses answer templates to guide outputs, with task-dependent neuron modulation that reduces activation in open-domain tasks but increases it in closed-domain scenarios.

AINeutralarXiv – CS AI · Apr 146/10
🧠

A Mechanistic Analysis of Looped Reasoning Language Models

Researchers conducted a mechanistic analysis of looped reasoning language models, discovering that these recurrent architectures learn inference stages similar to feedforward models but execute them iteratively. The study reveals that recurrent blocks converge to distinct fixed points with stable attention behavior, providing architectural insights for improving LLM reasoning capabilities.