AINeutralarXiv โ CS AI ยท 14h ago6/10
๐ง
RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents
RPA-Check introduces an automated four-stage framework for evaluating Large Language Model-based Role-Playing Agents in complex scenarios, addressing the gap in standard NLP metrics for assessing role adherence and narrative consistency. Testing across legal scenarios reveals that smaller, instruction-tuned models (8-9B parameters) outperform larger models in procedural consistency, suggesting optimal performance doesn't correlate with model scale.