y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

Multimodal Large Language Models as Synthetic Participants in Video-Based Studies: An Evaluation

arXiv – CS AI|Prabal Shrestha, Bohan Jiang, Haoning Xue, Huan Liu, Xinyi Zhou|
🤖AI Summary

Researchers evaluated whether multimodal large language models (MLLMs) like Gemini 3 Flash and Qwen 3 Omni can replicate human subjective responses in video perception tasks using the Perceived Message Sensation Value framework. The study found significant limitations: MLLMs demonstrated systematic biases including downward mean-shift, central-tendency bias, and inconsistent sensitivity to participant profiles, suggesting current models remain unreliable as synthetic human participants for subjective research.

Analysis

This research addresses a critical frontier in AI development: whether large language models can authentically simulate human subjective judgment beyond objective reasoning tasks. The study tested whether leading MLLMs could approximate how diverse individuals emotionally respond to video content, a task requiring not just comprehension but simulation of personal context and individual variation. The findings reveal fundamental gaps in how these models process and respond to subjective criteria.

The research builds on growing interest in using AI systems as research participants and synthetic test subjects. As organizations increasingly explore MLLMs for survey research, user testing, and behavioral simulation, understanding their limitations becomes essential. The systematic biases identified—particularly the downward mean-shift and flattening of subgroup differences—suggest these models compress subjective responses toward central tendencies rather than capturing authentic human diversity.

For AI developers and researchers, these results indicate that scaling model size alone doesn't solve the subjective judgment problem. The study shows that prompting strategies produce inconsistent improvements, sometimes enhancing certain metrics while degrading others. This suggests subjective simulation requires fundamentally different approaches than those optimized for objective reasoning tasks.

The implications extend to industries relying on synthetic human validation, from content creation platforms to UX research. Organizations considering MLLMs as cost-effective replacements for human participant studies must account for these documented biases. The open-source release of data and code enables further investigation into whether architectural changes or alternative training approaches could better capture human subjective diversity.

Key Takeaways
  • Leading MLLMs show systematic downward bias and central-tendency effects when rating subjective video engagement metrics
  • Current models flatten demographic and individual differences rather than capturing authentic human variation
  • Prompting strategies produce inconsistent improvements across different evaluation metrics
  • Subjective human simulation requires approaches distinct from objective reasoning task optimization
  • Organizations using MLLMs for research must account for documented biases in synthetically-generated responses
Mentioned in AI
Models
GeminiGoogle
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles