y0news
← Feed
Back to feed
🧠 AI NeutralImportance 5/10

CEI: A Benchmark for Evaluating Pragmatic Reasoning in Language Models

arXiv – CS AI|Jon Chun, Hannah Sussman, Adrian Mangine, Murathan Kocaman, Kirill Sidorko, Abhigya Koirala, Andre McCloud, Gwen Eisenbeis, Wisdom Akanwe, Moustapha Gassama, Eliezer Gonzalez Chirinos, Anne-Duncan Enright, Peter Dunson, Tiffanie Ng, Anna von Rosenstiel, Godwin Idowu|
🤖AI Summary

Researchers introduced the Contextual Emotional Inference (CEI) Benchmark, a dataset of 300 human-validated scenarios designed to evaluate how well large language models understand pragmatic reasoning in complex communication. The benchmark tests LLMs' ability to interpret ambiguous utterances across five pragmatic subtypes including sarcasm, mixed signals, and passive aggression in various social contexts.

Key Takeaways
  • CEI Benchmark provides 300 scenarios to test LLM pragmatic reasoning capabilities across workplace, family, social, and service contexts.
  • The dataset covers five pragmatic communication types: sarcasm/irony, mixed signals, strategic politeness, passive aggression, and deflection/misdirection.
  • Low inter-annotator agreement (0.06-0.25 kappa) reflects the inherent complexity and multiple valid interpretations in pragmatic inference.
  • The benchmark includes explicit power dynamics between speakers with three configurations: peer, higher-to-lower, and lower-to-higher relationships.
  • The dataset is released under CC-BY-4.0 license with a 4-level quality control pipeline for validation.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles