AINeutralarXiv โ CS AI ยท Feb 276/107
๐ง
PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions
Researchers introduce PoSh, a new evaluation metric for detailed image descriptions that uses scene graphs to guide LLMs-as-a-Judge, achieving better correlation with human judgments than existing methods. They also present DOCENT, a challenging benchmark dataset featuring artwork with expert-written descriptions to evaluate vision-language models' performance on complex image analysis.