🧠 AI⚪ NeutralImportance 5/10

M-QUEST -- Meme Question-Understanding Evaluation on Semantics and Toxicity

arXiv – CS AI|Stefano De Giorgis, Ting-Chih Chen, Filip Ilievski|March 5, 2026 at 05:00 AM

🤖AI Summary

Researchers developed M-QUEST, a new benchmark for evaluating AI models' ability to understand and detect toxicity in internet memes. The framework identifies 10 key dimensions for meme interpretation and tests 8 open-source language models, finding that instruction-tuned models perform better but still struggle with pragmatic inference.

Key Takeaways

→M-QUEST benchmark consists of 609 question-answer pairs across 307 memes to test AI toxicity detection capabilities.
→The framework identifies 10 dimensions crucial for meme understanding including textual, visual, emotional, and toxicity assessment.
→Current large language models show varying performance in toxic meme interpretation depending on their architecture.
→Models with instruction tuning and reasoning capabilities significantly outperform others in meme comprehension.
→Pragmatic inference questions remain the most challenging aspect for AI models to solve accurately.