🧠 AI⚪ NeutralImportance 4/10

Social Norm Reasoning in Multimodal Language Models: An Evaluation

arXiv – CS AI|Oishik Chowdhury, Anushka Debnath, Bastin Tony Roy Savarimuthu|March 5, 2026 at 05:00 AM

🤖AI Summary

Researchers evaluated five Multimodal Large Language Models (MLLMs) on their ability to reason about social norms in both text and image scenarios. GPT-4o performed best overall, while all models showed superior performance with text-based norm reasoning compared to image-based scenarios.

Key Takeaways

→MLLMs demonstrate better norm reasoning capabilities in text-based scenarios than image-based ones.
→GPT-4o outperformed other models in both text and image modalities for social norm reasoning.
→The free model Qwen-2.5VL showed promising performance as the second-best option.
→All evaluated models struggled with reasoning about complex social norms.
→The research suggests potential for integrating advanced MLLMs with Multi-Agent Systems for social interactions.

Mentioned in AI

Models

GPT-4OpenAI