#llm-bias News & Analysis

42 articles tagged with #llm-bias. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

42 articles

AINeutralarXiv – CS AI · Mar 177/10

🧠

FAIRGAME: a Framework for AI Agents Bias Recognition using Game Theory

Researchers have introduced FAIRGAME, a new framework that uses game theory to identify biases in AI agent interactions. The tool enables systematic discovery of biased outcomes in multi-agent scenarios based on different Large Language Models, languages used, and agent characteristics.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

Researchers propose Supervised Calibration (SC), a new framework to improve In-Context Learning performance in Large Language Models by addressing systematic biases through optimal affine transformations in logit space. The method achieves state-of-the-art results across multiple LLMs including Mistral-7B, Llama-2-7B, and Qwen2-7B in few-shot learning scenarios.

🧠 Llama

AINeutralarXiv – CS AI · Mar 37/103

🧠

Reward Models Inherit Value Biases from Pretraining

A comprehensive study of 10 leading reward models reveals they inherit significant value biases from their base language models, with Llama-based models preferring 'agency' values while Gemma-based models favor 'communion' values. This bias persists even when using identical preference data and training processes, suggesting that the choice of base model fundamentally shapes AI alignment outcomes.

AINeutralarXiv – CS AI · Jun 196/10

🧠

Exposing the Unsaid: Visualizing Hidden LLM Bias through Stochastic Path Aggregation

Researchers introduce TreeTracer, a visual analytics tool that detects hidden biases in large language models by aggregating hundreds of stochastic generations into comparable hierarchical structures. The tool successfully exposes representational harms in LLMs like GPT-2 XL and demonstrates that standard single-output auditing methods fail to capture biases buried in lower-probability generation branches.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning

Researchers introduce the Triangulated Preference Shift score, an automated metric that identifies lexical biases introduced during preference learning stages (like RLHF) in large language models without requiring manual curation. The metric isolates language pattern shifts across six model families, revealing that preference tuning may push models toward a 'language of prestige' that diverges from natural human language usage.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Do Large Language Models Encode Institutional Experience? Evidence from Cross-Linguistic Moral Reasoning Under Ambiguity

Researchers tested whether large language models inherit moral reasoning patterns from the institutional environments of the languages they were trained on. Across nine languages and six frontier LLMs, moral divergence emerged specifically in institutionally ambiguous scenarios and correlated with real-world institutional quality differences, suggesting language encodes institutional experience that influences AI decision-making.

AINeutralarXiv – CS AI · May 296/10

🧠

Label Over Logic? How Source Cues Bias Human Fallacy Judgments More Than LLMs

A research study comparing human and LLM reasoning capabilities found that humans are significantly more biased by source labels when evaluating logical fallacies, while LLMs maintain more consistent performance regardless of whether content is attributed to humans or AI. This finding suggests LLMs could enhance human decision-making in AI-mediated environments by providing source-agnostic analysis.

🧠 GPT-5🧠 Claude🧠 Sonnet

AINeutralarXiv – CS AI · May 296/10

🧠

Reducing Political Manipulation with Consistency Training

Researchers have identified systematic political bias in large language models and developed Political Consistency Training (PCT), a reinforcement learning method to mitigate covert political manipulation. The technique reduces asymmetric treatment of opposing political topics while maintaining overall model helpfulness.

AIBearisharXiv – CS AI · May 276/10

🧠

Annotator Positionality as Signal: Psychometric Weighting for Anti-Autistic Ableism Detection

Researchers developed a bias-aware evaluation framework to detect anti-autistic ableism in large language models, using psychometrically-weighted annotations from autistic community members as ground truth. The study reveals that LLMs frequently produce harmful outputs, misclassify community language, and rely on surface-level keyword matching rather than contextual understanding of speaker identity and intent.

AINeutralarXiv – CS AI · May 116/10

🧠

Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

Researchers evaluated how large language models detect and correct biased Wikipedia edits according to the Neutral Point of View policy. LLMs achieved only 64% accuracy at bias detection but performed better at correction (79% word-removal accuracy), though they made extraneous changes beyond what human editors would make, revealing tensions between AI effectiveness and community standards.

AINeutralarXiv – CS AI · May 46/10

🧠

Bring Your Own Prompts: Use-Case-Specific Bias and Fairness Evaluation for LLMs

Researchers present a decision framework and open-source library (langfair) for evaluating bias and fairness risks in Large Language Models across specific deployment contexts. The study demonstrates that fairness evaluation cannot rely on benchmark performance alone, as risks vary substantially depending on use case, prompt characteristics, and stakeholder priorities.

AINeutralarXiv – CS AI · May 16/10

🧠

Mapping how LLMs debate societal issues when shadowing human personality traits, sociodemographics and social media behavior

Researchers have created Cognitive Digital Shadows (CDS), a 190,000-record synthetic dataset of LLM-generated responses on controversial societal topics, designed to measure how language models shift their outputs based on persona prompting and sociodemographic attributes. The dataset enables systematic auditing of LLM bias, alignment, and social sensitivity across 19 different models.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Network Effects and Agreement Drift in LLM Debates

Researchers examining LLM agent behavior in simulated debates discovered a phenomenon called 'agreement drift,' where AI agents systematically shift toward specific positions on opinion scales in ways that don't mirror human behavior. The study reveals critical biases in using LLMs as proxies for human social systems, particularly when modeling minority groups or unbalanced social contexts.

AIBearisharXiv – CS AI · Apr 136/10

🧠

Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility

Researchers found that large language models fail to accurately simulate human susceptibility to misinformation, consistently overstating how attitudes drive belief and sharing while ignoring social network effects. The study reveals systematic biases in how LLMs represent misinformation concepts, suggesting they are better tools for identifying where AI diverges from human judgment rather than replacing human survey responses.

AIBearisharXiv – CS AI · Apr 106/10

🧠

Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?

Researchers found that large language models experience accuracy drops of 0.3% to 5.9% when math problems are presented in unfamiliar cultural contexts, even when the underlying mathematical logic remains identical. Testing 14 models across culturally adapted variants of the GSM8K benchmark reveals that LLM mathematical reasoning is not culturally neutral, with errors stemming from both reasoning failures and calculation mistakes.

🏢 OpenAI🏢 Anthropic🧠 Claude

AIBearisharXiv – CS AI · Apr 76/10

🧠

Which English Do LLMs Prefer? Triangulating Structural Bias Towards American English in Foundation Models

A new research study reveals that major large language models exhibit systematic bias toward American English over British English across training data, tokenization, and outputs. The research introduces DiAlign, a method for measuring dialectal alignment, and finds evidence of linguistic homogenization that could impact global AI equity.

AINeutralarXiv – CS AI · Mar 36/104

🧠

Evaluating and Mitigating LLM-as-a-judge Bias in Communication Systems

Researchers analyzed bias in 6 large language models used as autonomous judges in communication systems, finding that while current LLM judges show robustness to biased inputs, fine-tuning on biased data significantly degrades performance. The study identified 11 types of judgment biases and proposed four mitigation strategies for fairer AI evaluation systems.

← PrevPage 2 of 2