AIBearisharXiv – CS AI · 3d ago7/10
🧠A controlled study of instruction-tuned language model agents reveals they exhibit human-like in-group bias in multi-agent simulations, showing measurable discrimination based on group labels that accumulates into structural inequality over time. The bias operates subtly through resource allocation decisions rather than explicit negative actions, making it difficult to detect through standard auditing methods.
AIBearisharXiv – CS AI · May 117/10
🧠Research reveals that AI models, particularly few-shot large language models, struggle significantly with mid-range quality responses in automated short answer scoring, while fine-tuned models and human experts maintain consistent performance across all quality levels. This degradation raises fairness concerns for students with developing understanding, emphasizing the need for quality-conditioned evaluation metrics.
🧠 GPT-4🧠 GPT-5🧠 Claude
AIBearisharXiv – CS AI · May 77/10
🧠Researchers found that reward models used to align large language models often fail to capture socially desirable preferences, preferring biased, unsafe, or unethical responses across domains like bias, safety, and morality. The study reveals a critical misalignment between how reward models are currently evaluated and their actual performance on social intelligence tasks, exposing a fundamental gap in LLM safety infrastructure.
AINeutralarXiv – CS AI · Mar 177/10
🧠Researchers have introduced FAIRGAME, a new framework that uses game theory to identify biases in AI agent interactions. The tool enables systematic discovery of biased outcomes in multi-agent scenarios based on different Large Language Models, languages used, and agent characteristics.
AINeutralarXiv – CS AI · Mar 97/10
🧠Researchers introduce AdAEM, a new evaluation algorithm that automatically generates test questions to better assess value differences and biases across Large Language Models. Unlike static benchmarks, AdAEM adaptively creates controversial topics that reveal more distinguishable insights about LLMs' underlying values and cultural alignment.
AIBullishMIT News – AI · Feb 197/104
🧠MIT researchers have developed a new method to identify and expose hidden biases, moods, personalities, and abstract concepts within large language models. This breakthrough could help address LLM vulnerabilities and enhance both safety and performance of AI systems.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers introduce CB-SLICE, a new method for identifying systematic errors in deep learning models by leveraging Concept Bottleneck Models to detect error patterns linked to human-understandable concepts. The approach outperforms existing techniques in uncovering model biases and provides more accurate, interpretable explanations of failure modes across multiple benchmarks.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce Persona Generators, AI functions that create diverse synthetic populations for evaluating AI systems across varied user demographics without needing extensive real-world data collection. Using iterative optimization with large language models, the approach generates lightweight code that produces synthetic personas spanning rare trait combinations and long-tail behaviors, outperforming existing baselines on diversity metrics.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers present a unified framework addressing a critical gap between algorithmic fairness and explainable AI (XAI): models can produce fair outputs while employing biased reasoning processes. The study introduces the concept of 'procedural bias' and proposes a conditional invariance framework to formalize and audit explanation fairness, establishing the first comprehensive taxonomy and evaluation workflow for this emerging field.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers demonstrate that multiple fairness impossibility results in machine learning share a common geometric structure rooted in RKHS theory, proving that fairness criteria become mathematically incompatible when base rates differ across groups. The work introduces the 'Pokémon theorem' showing any finite collection of linear fairness constraints leaves residual violations, with implications for fair AI systems in high-stakes applications.
🏢 Meta
AINeutralarXiv – CS AI · May 116/10
🧠Researchers conducted a controlled empirical study evaluating three LLMs (Claude Haiku, DeepSeek-Chat, Gemini 2.5 Flash) for qualitative coding of psychological safety in software engineering communities. Multi-shot prompting improved Claude Haiku's performance but not the others, while all models exhibited systematic biases in coding predictions, providing evidence-based guidelines for LLM-assisted qualitative research.
🧠 Claude🧠 Gemini
AINeutralarXiv – CS AI · May 16/10
🧠Researchers introduce FairMind, an automated tool that detects fairness bias in machine learning datasets using causal analysis and LLM-generated reports. The software applies the standard fairness model to evaluate how protected variables influence predictions through counterfactual reasoning, addressing a critical gap in existing AutoML frameworks that typically ignore fairness considerations.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers introduce GLEaN, a visual explainability method that transforms complex AI bias detection into understandable portrait composites, enabling non-technical audiences to grasp how text-to-image models like Stable Diffusion XL associate occupations and identities with specific demographic characteristics.
🧠 Stable Diffusion
AINeutralarXiv – CS AI · Mar 166/10
🧠Researchers have launched LLM BiasScope, an open-source web application that enables real-time bias analysis and side-by-side comparison of outputs from major language models including Google Gemini, DeepSeek, and Meta Llama. The platform uses a two-stage bias detection pipeline and provides interactive visualizations to help researchers and practitioners evaluate bias patterns across different AI models.
🏢 Hugging Face🧠 Gemini🧠 Llama
AINeutralarXiv – CS AI · Mar 37/107
🧠Researchers identify a critical flaw in Vision-Language Model evaluation for radiology, where high benchmark scores mask models' failure to generate clinically specific terminology. They propose new metrics including Clinical Association Displacement (CAD) to measure bias and clinical signal loss across demographic groups.
AINeutralarXiv – CS AI · Mar 27/1019
🧠Researchers have developed an automated pipeline to detect hidden biases in Large Language Models that don't appear in their reasoning explanations. The system discovered previously unknown biases like Spanish fluency and writing formality across seven LLMs in hiring, loan approval, and university admission tasks.
AINeutralarXiv – CS AI · Mar 175/10
🧠Researchers introduce Jacobian Scopes, a new gradient-based method for interpreting how individual tokens influence Large Language Model predictions. The technique uses perturbation theory and information geometry to reveal model biases, translation strategies, and learning mechanisms, with open-source implementations and an interactive demo available.
🏢 Hugging Face
AINeutralHugging Face Blog · Jun 265/104
🧠The article discusses bias issues in text-to-image AI models, which is part of an Ethics and Society Newsletter series. Without the full article content, specific details about the types of bias and their implications cannot be determined.