#vlm-performance News & Analysis

2 articles tagged with #vlm-performance. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AINeutralarXiv – CS AI · Jun 87/10

🧠

MMBU: A Massive Multi-modal Biomedical Understanding Benchmark to Probe the Perception Capabilities of Vision-Language Models

Researchers introduced MMBU, the largest biomedical vision-language benchmark covering 35 medical imaging modalities with structured metadata. Testing 15 open-weight and 2 frontier VLMs revealed that while medical adaptation helps some models, high reported accuracy on existing benchmarks masks significant deficiencies in visual perception and domain generalization.

AINeutralarXiv – CS AI · Feb 276/107

🧠

PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions

Researchers introduce PoSh, a new evaluation metric for detailed image descriptions that uses scene graphs to guide LLMs-as-a-Judge, achieving better correlation with human judgments than existing methods. They also present DOCENT, a challenging benchmark dataset featuring artwork with expert-written descriptions to evaluate vision-language models' performance on complex image analysis.