y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#model-bias News & Analysis

11 articles tagged with #model-bias. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

11 articles
AIBearisharXiv – CS AI · 1d ago7/10
🧠

Cherry-pick Override: Unsafe Directional Commitment in LLM Judges under Mixed Evidence

Researchers identify a critical failure mode called Cherry-pick Override (CCO) where large language model judges make unsafe directional commitments when evaluating mixed evidence containing both supporting and refuting claims. The study demonstrates that LLM judges incorrectly return definitive verdicts on over 84% of conflicting-evidence cases instead of acknowledging ambiguity, with panel voting amplifying rather than mitigating this bias.

AIBearisharXiv – CS AI · 2d ago7/10
🧠

Does Topic Sentiment Cause Perceived Ideology? Comparing Human and LLM Annotations in Political News Articles

A research study compares how human annotators and large language models (GPT-4o-mini, Llama-3.3-70B) assign political ideology labels to news articles, finding that fine-tuned GPT-4o-mini models develop spurious correlations between sentiment and ideology that don't exist in human judgment. This reveals a critical vulnerability in using LLM annotations as training data for downstream tasks.

🧠 GPT-4🧠 Llama
AINeutralarXiv – CS AI · May 297/10
🧠

PRAIB: Peer Review AI Benchmark of Behaviour of LLM-Assisted Reviewing

Researchers introduce PRAIB, a benchmark framework that evaluates how Large Language Models perform peer review compared to human reviewers. Analysis of 11,000 LLM-generated reviews across major AI conferences reveals significant behavioral divergences: LLM ratings show less variability, positive bias, overconfidence, and frequently miss atomic weaknesses that human reviewers catch.

AIBearisharXiv – CS AI · May 277/10
🧠

Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges

Researchers demonstrate BITE, a black-box adversarial attack framework that exploits stylistic biases in LLM judges to artificially inflate evaluation scores while preserving semantic meaning. The attack achieves over 65% success rates across diverse LLM judges and tasks, exposing fundamental vulnerabilities in using language models for objective evaluation.

AIBearisharXiv – CS AI · May 127/10
🧠

Weight Pruning Amplifies Bias: A Multi-Method Study of Compressed LLMs for Edge AI

A comprehensive empirical study reveals that weight pruning—a technique for compressing large language models for edge devices—paradoxically amplifies bias while preserving performance metrics. The research shows activation-aware pruning methods maintain perplexity but increase stereotype reliance by up to 84%, suggesting current evaluation methods fail to detect fairness degradation in compressed models.

🏢 Perplexity
AINeutralarXiv – CS AI · Mar 127/10
🧠

Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias

Researchers discover that the 'Lost in the Middle' phenomenon in transformer models - where AI performs poorly on middle context but well on beginning and end content - is an inherent architectural property present even before training begins. The U-shaped performance bias stems from the mathematical structure of causal decoders with residual connections, creating a 'factorial dead zone' in middle positions.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs

Researchers developed a framework separating language proficiency from cultural knowledge access in large language models across 13 locales and 80 models. The study reveals that while English outperforms local languages on culture-agnostic questions, local languages consistently show advantages for accessing culture-specific knowledge once proficiency gaps are controlled for. This finding challenges the assumption that weaker local-language LLM performance indicates weaker cultural knowledge.

AINeutralarXiv – CS AI · May 276/10
🧠

Alignment Makes Language Models Normative, Not Descriptive

Research comparing 120 base and aligned language model pairs reveals that alignment training makes models more normative but less descriptive of actual human behavior. Base models predict real human choices in multi-round strategic games 10 times better, while aligned models excel only in single-shot, textbook scenarios where human behavior follows rational expectations.

AIBearisharXiv – CS AI · Apr 136/10
🧠

Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

Researchers introduce OmniBehavior, a benchmark for evaluating large language models' ability to simulate real-world human behavior across complex, long-horizon scenarios. The study reveals that current LLMs struggle with authentic behavioral simulation and exhibit systematic biases toward homogenized, overly-positive personas rather than capturing individual differences and realistic long-tail behaviors.

AIBearisharXiv – CS AI · Apr 106/10
🧠

The Impact of Steering Large Language Models with Persona Vectors in Educational Applications

Researchers studied how persona vectors—AI steering techniques that inject personality traits into large language models—affect educational applications like essay generation and automated grading. The study found that persona steering significantly degrades answer quality, with substantially larger negative impacts on open-ended humanities tasks compared to factual science questions, and reveals that AI scorers exhibit predictable bias patterns based on assigned personality traits.

AINeutralarXiv – CS AI · Mar 36/103
🧠

The First Impression Problem: Internal Bias Triggers Overthinking in Reasoning Models

Researchers identified 'internal bias' as a key cause of overthinking in AI reasoning models, where models form preliminary guesses that conflict with systematic reasoning. The study found that excessive attention to input questions triggers redundant reasoning steps, and current mitigation methods have proven ineffective.