AIBearisharXiv – CS AI · 10h ago6/10
🧠
CheXpercept: A Benchmark for Evaluating Expert-Level Lesion Perception in Chest X-rays
Researchers introduce CheXpercept, a benchmark dataset for evaluating vision-language models on chest X-ray analysis that goes beyond simple disease classification to test clinical-grade lesion perception. Testing 14 VLMs reveals that models perform adequately only at basic detection levels, with accuracy declining sharply on more complex visual tasks, and medical-specific models show no meaningful advantage over general models.