AINeutralarXiv โ CS AI ยท 7h ago6/10
๐ง
Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs
Researchers introduced 'Mind's Eye,' a benchmark that tests multimodal large language models (MLLMs) on visual reasoning tasks inspired by human intelligence tests. The evaluation reveals a significant gap between human performance (80% accuracy) and leading MLLMs (below 50%), exposing limitations in visuospatial reasoning, visual attention, and conceptual abstraction.