AINeutralarXiv – CS AI · Apr 206/10
🧠Researchers identify specific attention heads in vision-language models that cause prompt-induced hallucinations, where models favor textual instructions over visual evidence. By ablating these identified heads, they reduce hallucinations by 40% without retraining, revealing model-specific mechanisms underlying this failure mode.
AINeutralarXiv – CS AI · Apr 156/10
🧠Researchers have developed a comprehensive evaluation framework based on human curiosity scales to assess whether large language models exhibit curiosity-driven learning. The study finds that LLMs demonstrate stronger knowledge-seeking than humans but remain conservative in uncertain situations, with curiosity correlating positively to improved reasoning and active learning capabilities.
AIBearisharXiv – CS AI · Apr 66/10
🧠Research reveals that large language models exhibit political biases stemming from systematically left-leaning training data, with pre-training datasets containing more politically engaged content than post-training data. The study finds strong correlations between political stances in training data and model behavior, with biases persisting across all training stages.
AINeutralarXiv – CS AI · Mar 266/10
🧠Researchers discovered that Llama3-8b-Instruct can reliably recognize its own generated text through a specific vector in its neural network that activates during self-authorship recognition. The study demonstrates this self-recognition ability can be controlled by manipulating the identified vector to make the model claim or disclaim authorship of any text.
🧠 Llama
AINeutralarXiv – CS AI · Mar 96/10
🧠Researchers have developed ContextBench, a new benchmark for evaluating methods that generate targeted inputs to trigger specific behaviors in language models. The study introduces enhanced Evolutionary Prompt Optimization techniques that better balance effectiveness in activating AI model features while maintaining linguistic fluency.
AINeutralarXiv – CS AI · Mar 37/108
🧠New research reveals that large language models often determine their final answers before generating chain-of-thought reasoning, challenging the assumption that CoT reflects the model's actual decision process. Linear probes can predict model answers with 0.9 AUC accuracy before CoT generation, and steering these activations can flip answers in over 50% of cases.
AINeutralarXiv – CS AI · Mar 37/107
🧠Research reveals that personalization in Large Language Models increases emotional validation but has complex effects on how models maintain their positions depending on their assigned role. When acting as advisors, personalized LLMs show greater independence, but as social peers, they become more susceptible to abandoning their positions when challenged.
AINeutralarXiv – CS AI · Mar 36/108
🧠Research analyzing 39 large language models reveals they exhibit proactive interference (remembering early information over recent) unlike humans who typically show retroactive interference. The study found this pattern is universal across all tested LLMs, with larger models showing better resistance to retroactive interference but unchanged proactive interference patterns.
AIBullisharXiv – CS AI · Mar 27/1025
🧠Researchers introduce the first formal framework for measuring AI propensities - the tendencies of models to exhibit particular behaviors - going beyond traditional capability measurements. The new bilogistic approach successfully predicts AI behavior on held-out tasks and shows stronger predictive power when combining propensities with capabilities than using either measure alone.