Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
A comprehensive study examining 186 first-party AI model evaluation reports and 248 third-party sources reveals significant gaps in social impact assessments. Developers consistently under-report on bias, environmental costs, and labor impacts, while only they can authoritatively disclose data provenance and infrastructure details—information often withheld unless tied to compliance or product adoption.
The research exposes a critical governance weakness in AI development: the institutions responsible for evaluating foundation models lack consistent methodologies and transparency standards. First-party evaluations show declining coverage of environmental and bias impacts, suggesting developers deprioritize social accountability when not legally mandated. This fragmentation creates asymmetric information where third-party evaluators conduct rigorous assessments of harmful content and performance disparities, yet lack access to proprietary infrastructure data that only developers possess.
This evaluation gap reflects broader tensions in AI governance. As foundation models become embedded in high-stakes applications—healthcare, criminal justice, hiring—the adequacy of impact assessment directly affects public welfare. Developers face competing incentives: comprehensive social impact reporting can expose liabilities and competitive vulnerabilities, while minimal disclosure avoids regulatory scrutiny and negative publicity. Third-party evaluators operate without standardized frameworks or funding, creating redundant efforts and coverage blindspots.
For the AI industry, this landscape threatens legitimacy and invites regulatory intervention. Policymakers observing inconsistent evaluation practices may impose mandates that developers find onerous, or establish centralized evaluation bodies that slow innovation. Investors should recognize that companies demonstrating proactive social impact transparency may face competitive disadvantages short-term but build durable trust and regulatory resilience long-term.
The path forward requires structural reform: government-mandated developer transparency requirements, sustainable funding for independent evaluators, and shared infrastructure for aggregating third-party assessments. Without intervention, the current patchwork system will continue failing to adequately capture societal risks from increasingly powerful AI systems.
- →First-party AI evaluation reports are declining in coverage of environmental impact and bias, indicating developer deprioritization of social accountability
- →Only developers can authoritatively report data provenance, content moderation labor, and infrastructure costs, yet these disclosures remain deprioritized unless legally required
- →Third-party evaluators provide more rigorous assessment of bias and harmful content but lack access to proprietary information needed for complete impact evaluation
- →Current governance framework creates asymmetric information gaps that leave major societal risks from foundation models inadequately assessed
- →Regulatory mandates for developer transparency and sustainable third-party evaluation infrastructure are needed to address systemic evaluation gaps