AINeutralarXiv – CS AI · 7h ago7/10
🧠
StemBind: When MLLMs Get Lost Between Rules and Instances in Abstract Visual Reasoning
Researchers introduce StemBind, a diagnostic benchmark revealing that multimodal large language models can identify visual patterns and rules but frequently fail at the final step of matching answers to those rules. Across 24 frontier models tested on 19,533 tasks, the study identifies rule-to-instance binding (mapping abstract rules to specific visual examples) as the critical bottleneck, a failure point that neither scaling nor chain-of-thought prompting reliably resolves.