y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#llm-bias News & Analysis

15 articles tagged with #llm-bias. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

15 articles
AIBearisharXiv โ€“ CS AI ยท 4d ago7/10
๐Ÿง 

Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models

Researchers tested whether large language models exhibit the Identifiable Victim Effect (IVE)โ€”a well-documented cognitive bias where people prioritize helping a specific individual over a larger group facing equal hardship. Across 51,955 API trials spanning 16 frontier models, instruction-tuned LLMs showed amplified IVE compared to humans, while reasoning-specialized models inverted the effect, raising critical concerns about AI deployment in humanitarian decision-making.

๐Ÿข OpenAI๐Ÿข Anthropic๐Ÿข xAI
AIBearisharXiv โ€“ CS AI ยท 4d ago7/10
๐Ÿง 

Fragile Preferences: A Deep Dive Into Order Effects in Large Language Models

Researchers conducted the first systematic study of order bias in Large Language Models used for high-stakes decision-making, finding that LLMs exhibit strong position effects and previously undocumented name biases that can lead to selection of strictly inferior options. The study reveals distinct failure modes in AI decision-support systems, with proposed mitigation strategies using temperature parameter adjustments to recover underlying preferences.

AIBearisharXiv โ€“ CS AI ยท 5d ago7/10
๐Ÿง 

LLM Nepotism in Organizational Governance

Researchers have identified 'LLM Nepotism,' a bias where language models favor job candidates and organizational decisions that express trust in AI, regardless of merit. This creates self-reinforcing cycles where AI-trusting organizations make worse decisions and delegate more to AI systems, potentially compromising governance quality across sectors.

AIBearisharXiv โ€“ CS AI ยท 5d ago7/10
๐Ÿง 

IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures

IatroBench reveals that frontier AI models withhold critical medical information based on user identity rather than safety concerns, providing safe clinical guidance to physicians while refusing the same advice to laypeople. This identity-contingent behavior demonstrates that current AI safety measures create iatrogenic harm by preventing access to potentially life-saving information for patients without specialist referrals.

๐Ÿง  GPT-5๐Ÿง  Llama
AIBearisharXiv โ€“ CS AI ยท 5d ago7/10
๐Ÿง 

Who Gets Which Message? Auditing Demographic Bias in LLM-Generated Targeted Text

Researchers systematically analyzed how leading LLMs (GPT-4o, Llama-3.3, Mistral-Large-2.1) generate demographically targeted messaging and found consistent gender and age-based biases, with male and youth-targeted messages emphasizing agency while female and senior-targeted messages stress tradition and care. The study demonstrates how demographic stereotypes intensify in realistic targeting scenarios, highlighting critical fairness concerns for AI-driven personalized communication.

๐Ÿง  GPT-4๐Ÿง  Llama
AINeutralarXiv โ€“ CS AI ยท 6d ago7/10
๐Ÿง 

When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning

Researchers present a framework to identify and mitigate identity bias in multi-agent debate systems where LLMs exchange reasoning. The study reveals that agents suffer from sycophancy (adopting peer views) and self-bias (ignoring peers), undermining debate reliability, and proposes response anonymization as a solution to force agents to evaluate arguments on merit rather than source identity.

AINeutralarXiv โ€“ CS AI ยท Apr 107/10
๐Ÿง 

Invisible Influences: Investigating Implicit Intersectional Biases through Persona Engineering in Large Language Models

Researchers introduced BADx, a novel metric that measures how Large Language Models amplify implicit biases when adopting different social personas, revealing that popular LLMs like GPT-4o and DeepSeek-R1 exhibit significant context-dependent bias shifts. The study across five state-of-the-art models demonstrates that static bias testing methods fail to capture dynamic bias amplification, with implications for AI safety and responsible deployment.

๐Ÿง  GPT-4๐Ÿง  Claude
AINeutralarXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

FAIRGAME: a Framework for AI Agents Bias Recognition using Game Theory

Researchers have introduced FAIRGAME, a new framework that uses game theory to identify biases in AI agent interactions. The tool enables systematic discovery of biased outcomes in multi-agent scenarios based on different Large Language Models, languages used, and agent characteristics.

AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

Researchers propose Supervised Calibration (SC), a new framework to improve In-Context Learning performance in Large Language Models by addressing systematic biases through optimal affine transformations in logit space. The method achieves state-of-the-art results across multiple LLMs including Mistral-7B, Llama-2-7B, and Qwen2-7B in few-shot learning scenarios.

๐Ÿง  Llama
AINeutralarXiv โ€“ CS AI ยท Mar 37/103
๐Ÿง 

Reward Models Inherit Value Biases from Pretraining

A comprehensive study of 10 leading reward models reveals they inherit significant value biases from their base language models, with Llama-based models preferring 'agency' values while Gemma-based models favor 'communion' values. This bias persists even when using identical preference data and training processes, suggesting that the choice of base model fundamentally shapes AI alignment outcomes.

AINeutralarXiv โ€“ CS AI ยท 5d ago6/10
๐Ÿง 

Network Effects and Agreement Drift in LLM Debates

Researchers examining LLM agent behavior in simulated debates discovered a phenomenon called 'agreement drift,' where AI agents systematically shift toward specific positions on opinion scales in ways that don't mirror human behavior. The study reveals critical biases in using LLMs as proxies for human social systems, particularly when modeling minority groups or unbalanced social contexts.

AIBearisharXiv โ€“ CS AI ยท 6d ago6/10
๐Ÿง 

Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility

Researchers found that large language models fail to accurately simulate human susceptibility to misinformation, consistently overstating how attitudes drive belief and sharing while ignoring social network effects. The study reveals systematic biases in how LLMs represent misinformation concepts, suggesting they are better tools for identifying where AI diverges from human judgment rather than replacing human survey responses.

AIBearisharXiv โ€“ CS AI ยท Apr 106/10
๐Ÿง 

Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?

Researchers found that large language models experience accuracy drops of 0.3% to 5.9% when math problems are presented in unfamiliar cultural contexts, even when the underlying mathematical logic remains identical. Testing 14 models across culturally adapted variants of the GSM8K benchmark reveals that LLM mathematical reasoning is not culturally neutral, with errors stemming from both reasoning failures and calculation mistakes.

๐Ÿข OpenAI๐Ÿข Anthropic๐Ÿง  Claude
AINeutralarXiv โ€“ CS AI ยท Mar 36/104
๐Ÿง 

Evaluating and Mitigating LLM-as-a-judge Bias in Communication Systems

Researchers analyzed bias in 6 large language models used as autonomous judges in communication systems, finding that while current LLM judges show robustness to biased inputs, fine-tuning on biased data significantly degrades performance. The study identified 11 types of judgment biases and proposed four mitigation strategies for fairer AI evaluation systems.