From Vulnerable Data Subjects to Vulnerabilizing Data Practices: Navigating the Protection Paradox in AI-Based Analyses of Platformized Lives
This academic paper examines how AI and data science practices can paradoxically increase vulnerability of subjects they aim to protect, using a case study of computer vision analysis of children in monetized YouTube content. The authors develop an ethics protocol identifying four critical decision points—dataset design, operationalization, inference, and dissemination—where technical choices create vulnerabilizing factors including exposure, monetization, narrative fixing, and algorithmic optimization.
The paper addresses a fundamental tension in AI for Social Good initiatives: well-intentioned data-driven advocacy can create new harms through the very act of computational analysis. Rather than treating vulnerability as a fixed characteristic of subjects, the authors reframe it as something actively produced through technical pipeline decisions. This distinction matters because it shifts responsibility from data collection alone to the granular technical choices researchers make when transforming raw data into analytical insights.
The YouTube family vlog case exemplifies this paradox. A journalist seeking to quantify child presence for regulatory protection would necessarily create detailed computational models of children's appearances, behaviors, and environments—potentially generating more detailed surveillance infrastructure than existed before. Each technical decision (what constitutes 'presence,' how to operationalize age, how results are visualized) embeds ethical assumptions that can expose, reduce, or extract value from vulnerable subjects.
For the AI development community, this work challenges the assumption that transparency, consent, and intention suffice for ethical practice. The reflexive ethics protocol provides practical guidance by mapping vulnerabilizing factors across pipeline stages, helping researchers anticipate downstream harms. The framework particularly resonates with growing scrutiny of AI systems trained on children's data and platforms monetizing user-generated content.
The paper signals increasing maturity in AI ethics discourse, moving beyond principle-based guidelines toward methodological accountability. As regulatory frameworks tighten around data protection and AI governance, this approach to mapping technical decisions' ethical consequences may influence how organizations design and audit their AI systems.
- →Vulnerability is enacted through data practices and technical decisions, not just inherent to data subjects
- →AI-for-good initiatives can inadvertently increase computational exposure and extraction of vulnerable populations
- →A four-stage ethics protocol identifies critical decision junctures where harm prevention requires deliberate technical choices
- →The framework addresses the 'protection paradox' where advocacy research creates new surveillance infrastructure
- →Technical pipeline decisions are ethically constitutive and require reflexive examination of exposure, monetization, narrative fixing, and algorithmic optimization