🧠 AI🔴 BearishImportance 7/10

Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact

arXiv – CS AI|Jelena Meyer, David Garcia, Dirk U. Wulff|June 19, 2026 at 04:00 AM

🤖AI Summary

A peer-reviewed study finds that psychological profiles assigned to large language models through human-designed tests are largely measurement artifacts rather than genuine model traits. The research, analyzing 56 instruction-tuned LLMs, reveals that directional response bias—not actual personality—drives 81-90% of differences between models, undermining the validity of using standard psychological instruments to assess LLM safety, usability, and research applications.

Analysis

This research exposes a fundamental methodological flaw in how the AI industry evaluates and characterizes large language models. For years, developers and researchers have administered human psychology instruments to LLMs to generate seemingly stable personality and risk profiles, which then inform product positioning, safety claims, and academic studies using models as human proxies. The study demolishes this practice by demonstrating that these profiles reflect how instruments are structured rather than inherent model properties.

The core finding—that 81-90% of between-model variation stems from directional response bias rather than actual traits—has profound implications. This bias occurs when a model consistently gravitates toward particular scale endpoints or labeled options regardless of question content. While capability improvements reduce this bias, they don't eliminate it, suggesting it's deeply embedded in how transformers process text. The researchers introduce "response orthogonality," measuring how many test items pull trait and bias in opposite directions, as the actual predictor of apparent reliability.

For the AI safety and research communities, this study undermines confidence in existing psychological benchmarks. Companies marketing LLMs as having specific personality traits or risk profiles may be inadvertently marketing their instrument choices rather than genuine model characteristics. Researchers using models as human subjects in psychology experiments may be drawing conclusions from noise rather than signal.

The path forward requires industry-wide rethinking of evaluation methodologies. Dedicated assessments designed specifically for LLM evaluation, prioritizing response orthogonality, must replace borrowed human instruments. Until then, psychological profiles assigned to language models should be treated as unreliable indicators of their actual properties or suitability for applications where such profiles matter.

Key Takeaways

→Psychological profiles assigned to LLMs are primarily measurement artifacts driven by directional response bias, not genuine model traits.
→Directional response bias accounts for 81-90% of between-model variation in psychology tests, compared to only 9-16% in humans.
→Standard human psychology instruments lack validity for LLM assessment and are rarely orthogonal enough to measure true model properties.
→A model's apparent psychological profile can be manufactured by selectively choosing which test items to administer.
→The AI industry requires purpose-built psychological assessments centered on response orthogonality rather than adapted human instruments.

#llm-evaluation #psychological-testing #measurement-artifact #model-assessment #response-bias #ai-safety #psychometric-validity #benchmark-methodology

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge