#psychometrics News & Analysis

9 articles tagged with #psychometrics. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

9 articles

AINeutralarXiv – CS AI · 3d ago6/10

🧠

GenPT: Beyond Self-Report for Reliable LLM Psychometrics via Generative Projective Testing

Researchers introduce GenPT (Generative Projective Testing), a novel psychometric methodology that uses AI-generated stimuli to assess the psychological states of language models more reliably than traditional self-report questionnaires. The approach mitigates contamination from training data and social-desirability bias, showing significantly greater sensitivity to contextual changes in depression assessment compared to conventional methods.

AINeutralarXiv – CS AI · May 296/10

🧠

NICE: A Theory-Grounded Diagnostic Benchmark for Social Intelligence of LLMs

Researchers have developed NICE, a theory-grounded diagnostic benchmark for evaluating the social intelligence of large language models, organizing social abilities into 4 categories and 11 dimensions. Testing across 5 frontier LLMs reveals that while models perform well in aggregate accuracy, they consistently struggle with communication tasks, particularly in multi-turn dialogue, nonverbal understanding, and synchrony.

AINeutralarXiv – CS AI · Apr 146/10

🧠

The Rise and Fall of $G$ in AGI

Researchers apply psychometric analysis to large language model benchmarks, discovering that AI's general intelligence factor (G-factor) peaked around 2023-2024 before fragmenting as models specialized in reasoning tasks. The finding suggests AI development is shifting from unified capability improvement toward specialized tool-using systems, challenging assumptions about monolithic AGI progress.

AINeutralarXiv – CS AI · Apr 66/10

🧠

Human Psychometric Questionnaires Mischaracterize LLM Psychology: Evidence from Generation Behavior

Research reveals that standard human psychological questionnaires fail to accurately assess the true psychological characteristics of large language models (LLMs). The study of eight open-source LLMs found significant differences between self-reported questionnaire responses and actual generation behavior, suggesting questionnaires capture desired behavior rather than authentic psychological traits.

AINeutralarXiv – CS AI · Mar 266/10

🧠

Assessment Design in the AI Era: A Method for Identifying Items Functioning Differentially for Humans and Chatbots

Researchers developed a method using Differential Item Functioning (DIF) analysis to identify systematic differences between human and AI chatbot performance on educational assessments. The study tested six leading chatbots including ChatGPT-4o, Gemini, and Claude on chemistry and entrance exams to help educators design AI-resistant assessments.

🏢 Meta🧠 ChatGPT🧠 Claude

AIBullisharXiv – CS AI · Mar 166/10

🧠

Developing the PsyCogMetrics AI Lab to Evaluate Large Language Models and Advance Cognitive Science -- A Three-Cycle Action Design Science Study

Researchers have developed PsyCogMetrics AI Lab, a cloud-based platform that applies psychometric and cognitive science methodologies to evaluate Large Language Models. The platform was created through a three-cycle Action Design Science study and aims to advance AI evaluation methods at the intersection of psychology, cognitive science, and artificial intelligence.

AINeutralarXiv – CS AI · Feb 276/104

🧠

Correcting Human Labels for Rater Effects in AI Evaluation: An Item Response Theory Approach

Researchers propose using psychometric modeling to correct systematic biases in human evaluations of AI systems, demonstrating how Item Response Theory can separate true AI output quality from rater behavior inconsistencies. The approach was tested on OpenAI's summarization dataset and showed improved reliability in measuring AI model performance.

AINeutralarXiv – CS AI · Mar 44/103

🧠

Learning to Pay Attention: Unsupervised Modeling of Attentive and Inattentive Respondents in Survey Data

Researchers developed an unsupervised machine learning framework using autoencoders and probabilistic models to detect inattentive survey respondents without traditional attention checks. The study found that survey structure is more important than model complexity for detection effectiveness, with well-designed instruments enabling reliable identification of low-quality responses.

AINeutralarXiv – CS AI · Mar 44/103

🧠

Psychometric Item Validation Using Virtual Respondents with Trait-Response Mediators

Researchers developed a framework using large language models to simulate virtual respondents for validating psychometric survey items, addressing the challenge of ensuring construct validity without costly human data collection. The approach uses trait-response mediators to identify survey items that robustly measure intended psychological traits across three major trait theories.