y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions

arXiv – CS AI|Amith Ananthram, Elias Stengel-Eskin, Lorena A. Bradford, Julia Demarest, Adam Purvis, Keith Krut, Robert Stein, Rina Elster Pantalony, Mohit Bansal, Kathleen McKeown||7 views
🤖AI Summary

Researchers introduce PoSh, a new evaluation metric for detailed image descriptions that uses scene graphs to guide LLMs-as-a-Judge, achieving better correlation with human judgments than existing methods. They also present DOCENT, a challenging benchmark dataset featuring artwork with expert-written descriptions to evaluate vision-language models' performance on complex image analysis.

Key Takeaways
  • PoSh metric uses scene graphs as structured rubrics to guide LLM evaluation of detailed image descriptions, outperforming existing metrics including GPT-4o.
  • DOCENT benchmark contains artwork paired with expert-written references and human quality judgments from art history students.
  • PoSh achieves +0.05 higher Spearman correlation with human judgments compared to best open-weight alternatives.
  • Foundation models struggle with error-free coverage of images with rich scene dynamics, revealing limitations in current VLM capabilities.
  • The research enables advances in assistive text generation and establishes a demanding new task for measuring VLM progress.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles