🧠 AI🔴 BearishImportance 7/10

Vision-Language Models Suppress Female Representations Under Ambiguous Input

arXiv – CS AI|Arnau Marin-Llobet, Simon Henniger, Mahzarin R. Banaji|June 1, 2026 at 04:00 AM

🤖AI Summary

Researchers discovered that vision-language models suppress female representations in their outputs when processing ambiguous images, despite internally encoding female associations. The study introduces LALS, a new metric revealing that models systematically filter out female signals before generation while amplifying male signals, indicating a critical gap between internal model knowledge and biased outputs.

Analysis

This research exposes a sophisticated failure mode in modern vision-language models that alignment techniques have failed to address. While VLMs successfully reduce demographic bias when gender is visually explicit, they collapse toward male defaults for ambiguous inputs—precisely the scenarios most common in real-world deployment. The disconnect between what models internally represent and what they generate suggests bias emerges not from incomplete training data but from architectural filtering mechanisms built into the models themselves.

The introduction of LALS represents a methodological advance in AI interpretability, moving beyond surface-level output analysis to examine layer-wise activation patterns. This technique reveals an asymmetric processing pipeline where female-coded visual information peaks mid-network then diminishes systematically, while male signals strengthen throughout. The finding that culturally loaded visual features like clothing color further modulate these associations indicates bias is deeply embedded in how models process multimodal information.

For AI developers and organizations deploying vision-language models in high-stakes domains—hiring, surveillance, content curation—this research signals that current safety measures are insufficient. Standard alignment approaches that work on unambiguous inputs provide false confidence in model fairness. The practical implications extend beyond gender bias; if models suppress one demographic signal this thoroughly, similar suppression likely occurs for other protected characteristics, potentially creating legally and ethically problematic systems.

Future work must move beyond prompt-level interventions toward architectural solutions addressing how vision and language representations integrate. The research establishes that bias-washing—appearing fair while encoding discrimination—is a genuine concern in state-of-the-art models, necessitating deeper investigation into model internals before deployment.

Key Takeaways

→Vision-language models encode female associations internally but suppress them before generating text outputs on ambiguous images.
→The LALS metric reveals systematic filtering mechanisms where male signals amplify while female signals peak mid-network then diminish.
→Current alignment techniques successfully reduce bias for explicit gender cues but fail entirely for ambiguous real-world inputs.
→Culturally-loaded visual features like clothing color significantly modulate internal gender associations in vision-language models.
→The disconnect between internal representations and outputs suggests bias emerges from architectural filtering rather than training data alone.