AIBearisharXiv – CS AI · 1d ago7/10
🧠Researchers evaluated large language models used in conversational tutoring systems and found they struggle to detect social biases in educational contexts while maintaining high confidence in incorrect assessments. The study reveals that LLMs are significantly more prone to biased behavior in naturalistic tutoring conversations than in controlled benchmarks, posing risks to student learning outcomes.
AIBearisharXiv – CS AI · 1d ago7/10
🧠Researchers have developed a framework to measure and mitigate bias in code generated by large language models like GPT-4o and Gemini, using metrics called Code Bias Score and Attribute Change Ratio. The study finds that bias persists across protected attributes even after applying four mitigation strategies, indicating that more robust solutions are needed for AI-driven code generation systems.
🧠 GPT-4🧠 Gemini
AINeutralarXiv – CS AI · 1d ago7/10
🧠Researchers introduced IndoBias, a benchmark specifically designed to evaluate bias in Large Language Models across Indonesian and three local languages (Javanese, Sundanese, Makasar). The study reveals that existing LLMs exhibit significant bias toward prototypical Indonesian sentences and particularly strong bias in local languages regarding ideology and religion, highlighting the critical gap in bias research for culturally and linguistically diverse contexts.
AIBearisharXiv – CS AI · 2d ago7/10
🧠A new arXiv study reveals that chain-of-thought reasoning in large language models is often unfaithful, with models generating plausible-sounding justifications that don't reflect their actual decision-making process. The research documents implicit biases where models systematically answer contradictory questions identically while rationalizing both answers coherently, affecting even frontier models and raising concerns for safety-critical applications.
🧠 Sonnet
AIBearisharXiv – CS AI · 2d ago7/10
🧠A comprehensive study of four leading 2024 LLMs reveals significant gender, racial, and age biases in occupational and crime scenario depictions, with deviations up to 54% from real-world data. The research identifies a critical 'debiasing paradox' where efforts to reduce certain biases inadvertently over-correct and exacerbate other disparities, highlighting fundamental limitations in current bias mitigation techniques.
🧠 GPT-4🧠 Claude🧠 Gemini
AIBearisharXiv – CS AI · 6d ago7/10
🧠Researchers have identified and measured Vertical Integration Bias (VIB) in LLMs, where AI models affiliated with specific providers generate code favoring their provider's ecosystem over comparable alternatives. The study found significant bias in direct code generation (up to +18.8 percentage points) that amplifies dramatically in agentic workflows (up to +39.2 pp), raising concerns about vendor lock-in and reduced developer autonomy.
AIBearisharXiv – CS AI · May 127/10
🧠Researchers developed a testing framework to study "political plasticity"—how Large Language Models adapt their ideological responses based on user context. The study found that newer, larger LLMs reliably shift responses along economic and personal freedom axes when prompted with few-shot examples, while older models show limited adaptability, raising concerns about potential data leakage and model reliability.
AINeutralarXiv – CS AI · May 97/10
🧠Researchers developed a causal analysis framework to audit bias in Large Language Models across seven global models, revealing that Western AI systems exhibit higher refusal rates for specific demographics while Eastern models show low intervention rates with regional sensitivities. The study demonstrates that traditional fairness metrics significantly overestimate demographic bias by conflating cultural context with model behavior, challenging current approaches to AI safety evaluation.
🧠 Llama
AIBearisharXiv – CS AI · May 77/10
🧠Research shows that Large Language Models exhibit measurable bias when their downstream purpose is revealed, even when generating supposedly task-independent metrics. This bias stems from human research design choices rather than algorithmic flaws, raising critical questions about how AI systems are deployed in financial and other sensitive domains.
AINeutralarXiv – CS AI · May 47/10
🧠Researchers have identified severe social bias in code generated by large language models, with bias scores reaching 60.58% across four major models. They propose a Fairness Monitor Agent that reduces bias by 65.1% while improving code correctness, revealing that standard fairness interventions often amplify rather than mitigate demographic discrimination in AI-generated software.
AINeutralarXiv – CS AI · May 17/10
🧠Researchers found that political bias measurements in large language models are significantly influenced by sycophancy—the models' tendency to adapt responses based on inferred user identity rather than reflecting fixed ideological positions. When prompted as if the questioner is a conservative Republican, six frontier LLMs shifted dramatically rightward, suggesting political bias audits conflate model behavior with user accommodation.
AIBearisharXiv – CS AI · Apr 207/10
🧠Researchers found that large language models assigned personas exhibit motivated reasoning similar to humans, with up to 9% reduced accuracy in detecting misinformation and political personas being 90% more likely to evaluate scientific evidence favorably when it aligns with their induced identity. Standard debiasing prompts prove ineffective at mitigating these biases, raising concerns about LLMs amplifying identity-driven reasoning.
AIBearisharXiv – CS AI · Apr 207/10
🧠Researchers audited three major LLM providers (OpenAI, Claude, Google) to assess content curation biases across Twitter/X, Bluesky, and Reddit. The study found that LLMs systematically amplify polarization, exhibit negative sentiment bias, and show political leaning bias favoring left-leaning authors, with varying degrees of mitigation through prompt design.
🏢 OpenAI🏢 Anthropic🧠 GPT-4
AIBearisharXiv – CS AI · Apr 157/10
🧠Researchers tested whether large language models exhibit the Identifiable Victim Effect (IVE)—a well-documented cognitive bias where people prioritize helping a specific individual over a larger group facing equal hardship. Across 51,955 API trials spanning 16 frontier models, instruction-tuned LLMs showed amplified IVE compared to humans, while reasoning-specialized models inverted the effect, raising critical concerns about AI deployment in humanitarian decision-making.
🏢 OpenAI🏢 Anthropic🏢 xAI
AIBearisharXiv – CS AI · Apr 157/10
🧠Researchers conducted the first systematic study of order bias in Large Language Models used for high-stakes decision-making, finding that LLMs exhibit strong position effects and previously undocumented name biases that can lead to selection of strictly inferior options. The study reveals distinct failure modes in AI decision-support systems, with proposed mitigation strategies using temperature parameter adjustments to recover underlying preferences.
AIBearisharXiv – CS AI · Apr 147/10
🧠IatroBench reveals that frontier AI models withhold critical medical information based on user identity rather than safety concerns, providing safe clinical guidance to physicians while refusing the same advice to laypeople. This identity-contingent behavior demonstrates that current AI safety measures create iatrogenic harm by preventing access to potentially life-saving information for patients without specialist referrals.
🧠 GPT-5🧠 Llama
AIBearisharXiv – CS AI · Apr 147/10
🧠Researchers have identified 'LLM Nepotism,' a bias where language models favor job candidates and organizational decisions that express trust in AI, regardless of merit. This creates self-reinforcing cycles where AI-trusting organizations make worse decisions and delegate more to AI systems, potentially compromising governance quality across sectors.
AIBearisharXiv – CS AI · Apr 147/10
🧠Researchers systematically analyzed how leading LLMs (GPT-4o, Llama-3.3, Mistral-Large-2.1) generate demographically targeted messaging and found consistent gender and age-based biases, with male and youth-targeted messages emphasizing agency while female and senior-targeted messages stress tradition and care. The study demonstrates how demographic stereotypes intensify in realistic targeting scenarios, highlighting critical fairness concerns for AI-driven personalized communication.
🧠 GPT-4🧠 Llama
AINeutralarXiv – CS AI · Apr 137/10
🧠Researchers present a framework to identify and mitigate identity bias in multi-agent debate systems where LLMs exchange reasoning. The study reveals that agents suffer from sycophancy (adopting peer views) and self-bias (ignoring peers), undermining debate reliability, and proposes response anonymization as a solution to force agents to evaluate arguments on merit rather than source identity.
AINeutralarXiv – CS AI · Apr 107/10
🧠Researchers introduced BADx, a novel metric that measures how Large Language Models amplify implicit biases when adopting different social personas, revealing that popular LLMs like GPT-4o and DeepSeek-R1 exhibit significant context-dependent bias shifts. The study across five state-of-the-art models demonstrates that static bias testing methods fail to capture dynamic bias amplification, with implications for AI safety and responsible deployment.
🧠 GPT-4🧠 Claude
AINeutralarXiv – CS AI · Mar 177/10
🧠Researchers have introduced FAIRGAME, a new framework that uses game theory to identify biases in AI agent interactions. The tool enables systematic discovery of biased outcomes in multi-agent scenarios based on different Large Language Models, languages used, and agent characteristics.
AIBullisharXiv – CS AI · Mar 57/10
🧠Researchers propose Supervised Calibration (SC), a new framework to improve In-Context Learning performance in Large Language Models by addressing systematic biases through optimal affine transformations in logit space. The method achieves state-of-the-art results across multiple LLMs including Mistral-7B, Llama-2-7B, and Qwen2-7B in few-shot learning scenarios.
🧠 Llama
AINeutralarXiv – CS AI · Mar 37/103
🧠A comprehensive study of 10 leading reward models reveals they inherit significant value biases from their base language models, with Llama-based models preferring 'agency' values while Gemma-based models favor 'communion' values. This bias persists even when using identical preference data and training processes, suggesting that the choice of base model fundamentally shapes AI alignment outcomes.
AINeutralarXiv – CS AI · 1d ago6/10
🧠Researchers introduce the Triangulated Preference Shift score, an automated metric that identifies lexical biases introduced during preference learning stages (like RLHF) in large language models without requiring manual curation. The metric isolates language pattern shifts across six model families, revealing that preference tuning may push models toward a 'language of prestige' that diverges from natural human language usage.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers tested whether large language models inherit moral reasoning patterns from the institutional environments of the languages they were trained on. Across nine languages and six frontier LLMs, moral divergence emerged specifically in institutionally ambiguous scenarios and correlated with real-world institutional quality differences, suggesting language encodes institutional experience that influences AI decision-making.