AINeutralarXiv โ CS AI ยท 7h ago6/10
๐ง
RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity
Researchers introduced RoleConflictBench, a benchmark dataset containing over 13,000 scenarios across 65 social roles designed to test whether large language models prioritize contextual cues or learned preferences when facing conflicting role expectations. Analysis of 10 leading LLMs revealed that models predominantly rely on ingrained role preferences rather than responding dynamically to situational urgency, indicating a significant gap in contextual sensitivity.