An ethnographic study examines how a civic-tech initiative is attempting to reform data work practices by building online safety datasets collaboratively with communities most impacted by online harms, framing dataset production through a lens of reparative justice rather than extractive labor.
This research addresses a critical gap in how AI safety infrastructure is constructed. Rather than treating data annotation as disposable labor, the studied initiative centers accountability and repair in dataset development. The approach recognizes that those harmed by online toxicity possess essential expertise for building robust safety systems, yet historically receive minimal compensation or agency in dataset governance.
The study's reparative justice framework challenges prevailing industry norms where data work remains largely invisible and undercompensated. Current AI safety practices—including red teaming and safety evaluations—typically operate through extractive relationships that exclude affected communities from meaningful participation or decision-making about datasets shaping their digital experiences. This ethnographic investigation documents both the promise and concrete obstacles of alternative approaches.
The findings carry implications for AI development practices across the industry. As regulatory scrutiny intensifies around AI safety and responsible deployment, questions about dataset provenance and human labor become increasingly material. Companies and researchers building safety systems face growing pressure to demonstrate ethical data practices, though implementation barriers around compensation structures and collective governance remain substantial.
The emphasis on resetting accountability ties suggests future AI safety infrastructure may need to restructure how it engages data workers. The work highlights tensions between rapid dataset scaling and just compensation—a problem without straightforward technical solutions. Organizations developing safety systems will likely face stakeholder pressure to adopt more participatory and accountable approaches, though adoption timelines and business model feasibility remain uncertain.
- →Reparative approaches to dataset production center communities harmed by online harms rather than treating data work as disposable labor.
- →Current AI safety evaluation practices often exclude those most affected by online toxicity from governance and compensation structures.
- →Implementing just reward systems and collective dataset governance faces substantial tensions between scaling needs and accountability requirements.
- →Resetting accountability ties between humans, datasets, and AI systems requires interrupting established norms across the industry.
- →Future AI safety infrastructure may face stakeholder pressure to adopt more participatory and equitable data work practices.