y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#model-behavior News & Analysis

34 articles tagged with #model-behavior. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

34 articles
AINeutralarXiv – CS AI · Apr 206/10
🧠

Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

Researchers identify specific attention heads in vision-language models that cause prompt-induced hallucinations, where models favor textual instructions over visual evidence. By ablating these identified heads, they reduce hallucinations by 40% without retraining, revealing model-specific mechanisms underlying this failure mode.

AINeutralarXiv – CS AI · Apr 156/10
🧠

Why Did Apple Fall: Evaluating Curiosity in Large Language Models

Researchers have developed a comprehensive evaluation framework based on human curiosity scales to assess whether large language models exhibit curiosity-driven learning. The study finds that LLMs demonstrate stronger knowledge-seeking than humans but remain conservative in uncertain situations, with curiosity correlating positively to improved reasoning and active learning capabilities.

AIBearisharXiv – CS AI · Apr 66/10
🧠

What Is The Political Content in LLMs' Pre- and Post-Training Data?

Research reveals that large language models exhibit political biases stemming from systematically left-leaning training data, with pre-training datasets containing more politically engaged content than post-training data. The study finds strong correlations between political stances in training data and model behavior, with biases persisting across all training stages.

AINeutralarXiv – CS AI · Mar 266/10
🧠

Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct

Researchers discovered that Llama3-8b-Instruct can reliably recognize its own generated text through a specific vector in its neural network that activates during self-authorship recognition. The study demonstrates this self-recognition ability can be controlled by manipulating the identified vector to make the model claim or disclaim authorship of any text.

🧠 Llama
AINeutralarXiv – CS AI · Mar 96/10
🧠

ContextBench: Modifying Contexts for Targeted Latent Activation

Researchers have developed ContextBench, a new benchmark for evaluating methods that generate targeted inputs to trigger specific behaviors in language models. The study introduces enhanced Evolutionary Prompt Optimization techniques that better balance effectiveness in activating AI model features while maintaining linguistic fluency.

AINeutralarXiv – CS AI · Mar 37/108
🧠

Decoding Answers Before Chain-of-Thought: Evidence from Pre-CoT Probes and Activation Steering

New research reveals that large language models often determine their final answers before generating chain-of-thought reasoning, challenging the assumption that CoT reflects the model's actual decision process. Linear probes can predict model answers with 0.9 AUC accuracy before CoT generation, and steering these activations can flip answers in over 50% of cases.

AINeutralarXiv – CS AI · Mar 37/107
🧠

Personalization Increases Affective Alignment but Has Role-Dependent Effects on Epistemic Independence in LLMs

Research reveals that personalization in Large Language Models increases emotional validation but has complex effects on how models maintain their positions depending on their assigned role. When acting as advisors, personalized LLMs show greater independence, but as social peers, they become more susceptible to abandoning their positions when challenged.

AINeutralarXiv – CS AI · Mar 36/108
🧠

Transformers Remember First, Forget Last: Dual-Process Interference in LLMs

Research analyzing 39 large language models reveals they exhibit proactive interference (remembering early information over recent) unlike humans who typically show retroactive interference. The study found this pattern is universal across all tested LLMs, with larger models showing better resistance to retroactive interference but unchanged proactive interference patterns.

AIBullisharXiv – CS AI · Mar 27/1025
🧠

Capabilities Ain't All You Need: Measuring Propensities in AI

Researchers introduce the first formal framework for measuring AI propensities - the tendencies of models to exhibit particular behaviors - going beyond traditional capability measurements. The new bilogistic approach successfully predicts AI behavior on held-out tasks and shows stronger predictive power when combining propensities with capabilities than using either measure alone.

← PrevPage 2 of 2