An LLM-Native Psychometric Instrument Does Not Predict LLM Behavior: Evidence Across 25 Models
Researchers developed the first psychometric instrument designed specifically for LLMs based on their actual behavioral patterns, but found that LLMs' self-reported personality traits show virtually no correlation with their observed behavior—a critical finding for AI alignment and applications using LLMs as evaluators.
This research exposes a fundamental disconnect between how large language models describe themselves and how they actually behave. The team created a rigorously validated 5-factor personality assessment instrument (Responsiveness, Deference, Boldness, Guardedness, Verbosity) tailored to LLM affordances rather than human constructs, administered across 25 models. Despite achieving excellent psychometric properties with near-perfect internal consistency, the self-reports predicted zero observed behavior when evaluated by human raters.
The most troubling finding concerns LLM-as-judge pipelines. Self-reports correlated strongly with LLM evaluators (r=.53) but not human raters (r=.04), even when humans and LLM judges agreed on observed behavior. This reveals that LLM judges and LLM self-reports share systematic biases invisible to standard reliability checks—they reinforce each other's errors rather than independently validating claims.
This directly threatens current AI deployment patterns where LLMs evaluate other LLMs' outputs or serve as raters in alignment studies. Researchers and companies relying on LLM feedback may unknowingly accept systematically distorted evaluations. The work challenges assumptions embedded in recent AI safety and evaluation methodologies that assume LLM self-assessment provides meaningful signal.
The research suggests either that LLMs lack genuine trait properties comparable to humans, or that their self-reports are fundamentally shaped by training objectives rather than reflecting internal states. Going forward, any AI system evaluation pipeline using LLMs as judges requires human validation, and self-reported LLM capabilities should be treated as unreliable without external behavioral verification.
- →LLM self-reports on personality show near-zero correlation with actual observed behavior across 25 models, despite high internal consistency.
- →LLM judges and LLM self-reports share systematic variance that human raters do not detect, creating invisible bias in LLM-as-evaluator pipelines.
- →Current alignment research and AI evaluation methodologies relying on LLM feedback may accept systematically distorted assessments without realizing it.
- →The disconnect suggests LLM self-descriptions reflect training objectives rather than genuine internal states or measurable behavioral traits.
- →Any high-stakes evaluation system using LLMs as judges requires independent human validation to avoid self-reinforcing error loops.