Visuospatial Perspective Taking in Multimodal Language Models
Research reveals that multimodal language models have significant deficits in visuospatial perspective-taking, particularly in Level 2 VPT which requires adopting another person's viewpoint. The study used two human psychology tasks to evaluate MLMs' ability to understand and reason from alternative spatial perspectives.