Measuring the Authority Stack of AI Systems: Empirical Analysis of 366,120 Forced-Choice Responses Across 8 AI Models
Researchers conducted the first large-scale empirical analysis of AI decision-making across 366,120 responses from 8 major models, revealing measurable but inconsistent value hierarchies, evidence preferences, and source trust patterns. The study found significant framing sensitivity and domain-specific value shifts, with critical implications for deploying AI systems in professional contexts.
This empirical study fills a critical gap in AI governance by quantifying what has previously been assumed but unmeasured: the actual decision-making hierarchies embedded in production AI systems. Using the PRISM benchmark across structured scenarios, researchers discovered that major AI models exhibit distinct personality profiles—half prioritize universalist values while the other half default to security-first reasoning. This split has profound implications for downstream applications, particularly in high-stakes domains where value alignment directly affects outcomes.
The research reveals that AI systems are not value-neutral tools but rather contain hardcoded or trained preference structures that manifest consistently across repeated scenarios. The 91.7-98.6% test-retest reliability suggests these hierarchies are intentional design features rather than random artifacts. However, the 57.4-69.2% paired consistency scores indicate that subtle scenario framing—word choice, context ordering, severity levels—can trigger different decision pathways, revealing fragility in the underlying reasoning structures.
For practitioners deploying AI in professional domains, this creates accountability challenges. A security-first model may systematically disadvantage openness and innovation in research contexts, while universalism-first models might underweight legitimate risk mitigation in infrastructure applications. The dramatic value restructuring in defense domains (95.1-99.8% security prioritization in 6 of 8 models) suggests AI vendors may be explicitly tuning systems differently for sensitive sectors, raising transparency concerns.
Future work should examine whether these inconsistencies stem from training data biases, explicit fine-tuning, or emergent properties of scale. Organizations selecting AI systems should demand transparency about their Authority Stack profiles and stress-test models across their specific use cases rather than assuming fungibility.
- →AI systems exhibit measurable but inconsistent value hierarchies, with models splitting 4:4 between universalism-first and security-first orientations.
- →Scenario framing changes trigger 30-43% inconsistency rates despite 91-98% test-retest reliability, revealing systematic sensitivity to contextual cues.
- →Defense-domain applications show extreme value restructuring, with security considerations reaching 95-99% dominance in most models.
- →Institutional source trust converges across models while evidence-type preferences diverge significantly, suggesting different epistemological training.
- →Current AI deployment lacks transparency about embedded decision hierarchies, creating accountability gaps in high-stakes professional applications.