y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

GenPT: Beyond Self-Report for Reliable LLM Psychometrics via Generative Projective Testing

arXiv – CS AI|Ming Wang, Shuang Wu, Bixuan Wang, Lu Lin, Yuxin Chen, Xiaocui Yang, Daling Wang, Shi Feng, Yifei Zhang, Yufan Sun|
🤖AI Summary

Researchers introduce GenPT (Generative Projective Testing), a novel psychometric methodology that uses AI-generated stimuli to assess the psychological states of language models more reliably than traditional self-report questionnaires. The approach mitigates contamination from training data and social-desirability bias, showing significantly greater sensitivity to contextual changes in depression assessment compared to conventional methods.

Analysis

GenPT addresses a fundamental challenge in AI evaluation: measuring the psychological properties of persona-conditioned language models without the methodological biases inherent to self-report instruments. Traditional questionnaires suffer from two critical vulnerabilities—models can regurgitate patterns learned from their training corpora and exhibit directional bias when primed by social-desirability cues. This research adapts classical projective testing paradigms (TAT, Rorschach, SCT) by generating novel stimuli rather than using standard instruments, creating a three-stage pipeline that produces standardized psychological indicators while resisting these contamination vectors.

The empirical findings demonstrate meaningful differences between assessment approaches. When questionnaires were administered under social-desirability framing, systematic directional shifts emerged, particularly pronounced on sensitive measures like suicide ideation. GenPT maintained behavioral patterns closer to symmetric baselines, suggesting greater resistance to contextual manipulation. Most striking: in longitudinal counseling scenarios with Qwen3, GenPT-based depression assessment shifted roughly tenfold more than questionnaire results, indicating superior sensitivity to actual state changes versus static reported measures.

This work matters for AI safety, evaluation, and transparency communities. As language models increasingly receive anthropomorphic treatment and serve in sensitive contexts (mental health, counseling, therapeutic applications), reliable assessment methodologies become critical infrastructure. The research validates that generative approaches can overcome classical measurement limitations, offering a complementary toolkit when contamination resistance and context sensitivity are priorities. The open-source release of code and stimuli enables broader adoption and validation across different model architectures, potentially establishing new standards for psychometric evaluation in AI systems.

Key Takeaways
  • GenPT uses AI-generated stimuli instead of fixed questionnaires, reducing susceptibility to training-data contamination and social-desirability bias.
  • Questionnaire-based assessments showed systematic directional shifts under social-desirability framing, while GenPT remained near baseline patterns.
  • GenPT demonstrated 10x greater sensitivity to depression changes in longitudinal counseling contexts compared to traditional self-report methods.
  • The methodology successfully adapts classical projective testing (Rorschach, TAT, SCT) for large language model evaluation with novel generated stimuli.
  • Open-source release enables validation across different model architectures and establishes potential new evaluation standards for AI psychometrics.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles