MMBU: A Massive Multi-modal Biomedical Understanding Benchmark to Probe the Perception Capabilities of Vision-Language Models
Researchers introduced MMBU, the largest biomedical vision-language benchmark covering 35 medical imaging modalities with structured metadata. Testing 15 open-weight and 2 frontier VLMs revealed that while medical adaptation helps some models, high reported accuracy on existing benchmarks masks significant deficiencies in visual perception and domain generalization.