🧠 AI⚪ NeutralImportance 4/10

Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects

arXiv – CS AI|Ji-Lun Peng, Yun-Nung Chen|March 5, 2026 at 05:00 AM

🤖AI Summary

Researchers propose an anonymous evaluation method for Role-Playing Agents (RPAs) built on large language models, revealing that current benchmarks are biased by character name recognition. The study shows that incorporating personality traits, whether human-annotated or self-generated by AI models, significantly improves role-playing performance under anonymous conditions.

Key Takeaways

→Current role-playing agent evaluations are biased because models rely on memory associated with famous character names rather than true role-playing ability.
→Anonymous evaluation significantly degrades role-playing performance, confirming that character names carry implicit information that models exploit.
→Incorporating personality traits consistently improves role-playing agent performance in anonymous settings.
→Self-generated personality traits achieve performance comparable to human-annotated ones, offering a scalable solution.
→The research establishes a fairer evaluation protocol for assessing role-playing agents and validates personality-enhanced frameworks.