AINeutralarXiv โ CS AI ยท 2d ago5/10
๐ง
CEI: A Benchmark for Evaluating Pragmatic Reasoning in Language Models
Researchers introduced the Contextual Emotional Inference (CEI) Benchmark, a dataset of 300 human-validated scenarios designed to evaluate how well large language models understand pragmatic reasoning in complex communication. The benchmark tests LLMs' ability to interpret ambiguous utterances across five pragmatic subtypes including sarcasm, mixed signals, and passive aggression in various social contexts.