AIBullisharXiv โ CS AI ยท 4h ago4
๐ง
Interpretable Debiasing of Vision-Language Models for Social Fairness
Researchers have developed DeBiasLens, a new framework that uses sparse autoencoders to identify and deactivate social bias neurons in Vision-Language models without degrading their performance. The model-agnostic approach addresses concerns about unintended social bias in VLMs by making the debiasing process interpretable and targeting internal model dynamics rather than surface-level fixes.