🧠 AI⚪ NeutralImportance 5/10

CEI: A Benchmark for Evaluating Pragmatic Reasoning in Language Models

arXiv – CS AI|Jon Chun, Hannah Sussman, Adrian Mangine, Murathan Kocaman, Kirill Sidorko, Abhigya Koirala, Andre McCloud, Gwen Eisenbeis, Wisdom Akanwe, Moustapha Gassama, Eliezer Gonzalez Chirinos, Anne-Duncan Enright, Peter Dunson, Tiffanie Ng, Anna von Rosenstiel, Godwin Idowu|March 12, 2026 at 04:00 AM

🤖AI Summary

Researchers introduced the Contextual Emotional Inference (CEI) Benchmark, a dataset of 300 human-validated scenarios designed to evaluate how well large language models understand pragmatic reasoning in complex communication. The benchmark tests LLMs' ability to interpret ambiguous utterances across five pragmatic subtypes including sarcasm, mixed signals, and passive aggression in various social contexts.

Key Takeaways

→CEI Benchmark provides 300 scenarios to test LLM pragmatic reasoning capabilities across workplace, family, social, and service contexts.
→The dataset covers five pragmatic communication types: sarcasm/irony, mixed signals, strategic politeness, passive aggression, and deflection/misdirection.
→Low inter-annotator agreement (0.06-0.25 kappa) reflects the inherent complexity and multiple valid interpretations in pragmatic inference.
→The benchmark includes explicit power dynamics between speakers with three configurations: peer, higher-to-lower, and lower-to-higher relationships.
→The dataset is released under CC-BY-4.0 license with a 4-level quality control pipeline for validation.