y0news
AnalyticsDigestsSourcesRSSAICrypto
#visual-prompting1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 10h ago6/10
๐Ÿง 

Graph-of-Mark: Promote Spatial Reasoning in Multimodal Language Models with Graph-Based Visual Prompting

Researchers introduced Graph-of-Mark (GoM), a new visual prompting technique that overlays scene graphs onto images to improve spatial reasoning in multimodal language models. Testing across 3 open-source MLMs and 4 datasets showed GoM improved zero-shot visual question answering and localization accuracy by up to 11 percentage points compared to existing methods like Set-of-Mark.