AINeutralarXiv – CS AI · 6h ago6/10
🧠
AMVICC: A Novel Benchmark for Cross-Modal Failure Mode Profiling for VLMs and IGMs
Researchers introduce AMVICC, a novel benchmark for evaluating failure modes in vision-language models (VLMs) and image generation models (IGMs). Testing 11 multimodal LLMs and 3 IGMs across 9 visual reasoning categories, the study reveals that both model types struggle with basic visual concepts like object orientation, quantity, and spatial relationships, with some failures shared across modalities and others model-specific.