#bias-detection News & Analysis

18 articles tagged with #bias-detection. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

18 articles

AIBearisharXiv – CS AI · 3d ago7/10

🧠

Human-like in-group bias in instruction-tuned language model agents

A controlled study of instruction-tuned language model agents reveals they exhibit human-like in-group bias in multi-agent simulations, showing measurable discrimination based on group labels that accumulates into structural inequality over time. The bias operates subtly through resource allocation decisions rather than explicit negative actions, making it difficult to detect through standard auditing methods.

AIBearisharXiv – CS AI · May 117/10

🧠

Quality-Conditioned Agreement in Automated Short Answer Scoring: Mid-Range Degradation and the Impact of Task-Specific Adaptation

Research reveals that AI models, particularly few-shot large language models, struggle significantly with mid-range quality responses in automated short answer scoring, while fine-tuned models and human experts maintain consistent performance across all quality levels. This degradation raises fairness concerns for students with developing understanding, emphasizing the need for quality-conditioned evaluation metrics.

🧠 GPT-4🧠 GPT-5🧠 Claude

AIBearisharXiv – CS AI · May 77/10

🧠

Misaligned by Reward: Socially Undesirable Preferences in LLMs

Researchers found that reward models used to align large language models often fail to capture socially desirable preferences, preferring biased, unsafe, or unethical responses across domains like bias, safety, and morality. The study reveals a critical misalignment between how reward models are currently evaluated and their actual performance on social intelligence tasks, exposing a fundamental gap in LLM safety infrastructure.

AINeutralarXiv – CS AI · Mar 177/10

🧠

FAIRGAME: a Framework for AI Agents Bias Recognition using Game Theory

Researchers have introduced FAIRGAME, a new framework that uses game theory to identify biases in AI agent interactions. The tool enables systematic discovery of biased outcomes in multi-agent scenarios based on different Large Language Models, languages used, and agent characteristics.

AINeutralarXiv – CS AI · Mar 97/10

🧠

AdAEM: An Adaptively and Automated Extensible Measurement of LLMs' Value Difference

Researchers introduce AdAEM, a new evaluation algorithm that automatically generates test questions to better assess value differences and biases across Large Language Models. Unlike static benchmarks, AdAEM adaptively creates controversial topics that reveal more distinguishable insights about LLMs' underlying values and cultural alignment.

AIBullishMIT News – AI · Feb 197/104

🧠

Exposing biases, moods, personalities, and abstract concepts hidden in large language models

MIT researchers have developed a new method to identify and expose hidden biases, moods, personalities, and abstract concepts within large language models. This breakthrough could help address LLM vulnerabilities and enhance both safety and performance of AI systems.

AINeutralarXiv – CS AI · 2d ago6/10

🧠

CB-SLICE: Concept-Based Interpretable Error Slice Discovery

Researchers introduce CB-SLICE, a new method for identifying systematic errors in deep learning models by leveraging Concept Bottleneck Models to detect error patterns linked to human-understandable concepts. The approach outperforms existing techniques in uncovering model biases and provides more accurate, interpretable explanations of failure modes across multiple benchmarks.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Persona Generators: Generating Diverse Synthetic Personas for Arbitrary Contexts

Researchers introduce Persona Generators, AI functions that create diverse synthetic populations for evaluating AI systems across varied user demographics without needing extensive real-world data collection. Using iterative optimization with large language models, the approach generates lightweight code that produces synthetic personas spanning rare trait combinations and long-tail behaviors, outperforming existing baselines on diversity metrics.

AINeutralarXiv – CS AI · May 126/10

🧠

Fairness of Explanations in Artificial Intelligence (AI): A Unifying Framework, Axioms, and Future Direction toward Responsible AI

Researchers present a unified framework addressing a critical gap between algorithmic fairness and explainable AI (XAI): models can produce fair outputs while employing biased reasoning processes. The study introduces the concept of 'procedural bias' and proposes a conditional invariance framework to formalize and audit explanation fairness, establishing the first comprehensive taxonomy and evaluation workflow for this emerging field.

AINeutralarXiv – CS AI · May 126/10

🧠

The Pok\'emon Theorem and other Fairness Impossibility Results

Researchers demonstrate that multiple fairness impossibility results in machine learning share a common geometric structure rooted in RKHS theory, proving that fairness criteria become mathematically incompatible when base rates differ across groups. The work introduces the 'Pokémon theorem' showing any finite collection of linear fairness constraints leaves residual violations, with implications for fair AI systems in high-stakes applications.

🏢 Meta

AINeutralarXiv – CS AI · May 116/10

🧠

Prompt Engineering Strategies for LLM-based Qualitative Coding of Psychological Safety in Software Engineering Communities: A Controlled Empirical Study

Researchers conducted a controlled empirical study evaluating three LLMs (Claude Haiku, DeepSeek-Chat, Gemini 2.5 Flash) for qualitative coding of psychological safety in software engineering communities. Multi-shot prompting improved Claude Haiku's performance but not the others, while all models exhibited systematic biases in coding predictions, providing evidence-based guidelines for LLM-assisted qualitative research.

🧠 Claude🧠 Gemini

AINeutralarXiv – CS AI · May 16/10

🧠

Automatic Causal Fairness Analysis with LLM-Generated Reporting

Researchers introduce FairMind, an automated tool that detects fairness bias in machine learning datasets using causal analysis and LLM-generated reports. The software applies the standard fairness model to evaluate how protected variables influence predictions through counterfactual reasoning, addressing a critical gap in existing AutoML frameworks that typically ignore fairness considerations.

AINeutralarXiv – CS AI · Apr 146/10

🧠

GLEaN: A Text-to-image Bias Detection Approach for Public Comprehension

Researchers introduce GLEaN, a visual explainability method that transforms complex AI bias detection into understandable portrait composites, enabling non-technical audiences to grasp how text-to-image models like Stable Diffusion XL associate occupations and identities with specific demographic characteristics.

🧠 Stable Diffusion

AINeutralarXiv – CS AI · Mar 166/10

🧠

LLM BiasScope: A Real-Time Bias Analysis Platform for Comparative LLM Evaluation

Researchers have launched LLM BiasScope, an open-source web application that enables real-time bias analysis and side-by-side comparison of outputs from major language models including Google Gemini, DeepSeek, and Meta Llama. The platform uses a two-stage bias detection pipeline and provides interactive visualizations to help researchers and practitioners evaluate bias patterns across different AI models.

🏢 Hugging Face🧠 Gemini🧠 Llama

AINeutralarXiv – CS AI · Mar 37/107

🧠

Measuring What VLMs Don't Say: Validation Metrics Hide Clinical Terminology Erasure in Radiology Report Generation

Researchers identify a critical flaw in Vision-Language Model evaluation for radiology, where high benchmark scores mask models' failure to generate clinically specific terminology. They propose new metrics including Clinical Association Displacement (CAD) to measure bias and clinical signal loss across demographic groups.

AINeutralarXiv – CS AI · Mar 27/1019

🧠

Biases in the Blind Spot: Detecting What LLMs Fail to Mention

Researchers have developed an automated pipeline to detect hidden biases in Large Language Models that don't appear in their reasoning explanations. The system discovered previously unknown biases like Spanish fluency and writing formality across seven LLMs in hiring, loan approval, and university admission tasks.

AINeutralarXiv – CS AI · Mar 175/10

🧠

Jacobian Scopes: token-level causal attributions in LLMs

Researchers introduce Jacobian Scopes, a new gradient-based method for interpreting how individual tokens influence Large Language Model predictions. The technique uses perturbation theory and information geometry to reveal model biases, translation strategies, and learning mechanisms, with open-source implementations and an interactive demo available.

🏢 Hugging Face

AINeutralHugging Face Blog · Jun 265/104

🧠

Ethics and Society Newsletter #4: Bias in Text-to-Image Models

The article discusses bias issues in text-to-image AI models, which is part of an Ethics and Society Newsletter series. Without the full article content, specific details about the types of bias and their implications cannot be determined.