AINeutralarXiv โ CS AI ยท 6h ago5
๐ง
Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks
Researchers introduce Ref-Adv, a new benchmark for testing multimodal large language models' visual reasoning capabilities in referring expression tasks. The benchmark reveals that current MLLMs, despite performing well on standard datasets like RefCOCO, rely heavily on shortcuts and show significant gaps in genuine visual reasoning and grounding abilities.