AINeutralarXiv – CS AI · 9h ago6/10
🧠
SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations
Researchers introduce SoCRATES, a new benchmark for evaluating how well large language models can mediate conflicts across diverse scenarios and cultural contexts. Testing eight frontier LLMs reveals that even top-performing mediators resolve only about one-third of disagreements, with significant performance variations based on cultural identity, emotional reactivity, and party composition.