y0news
← Feed
←Back to feed
🧠 AIβšͺ NeutralImportance 6/10

PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions

arXiv – CS AI|Amith Ananthram, Elias Stengel-Eskin, Lorena A. Bradford, Julia Demarest, Adam Purvis, Keith Krut, Robert Stein, Rina Elster Pantalony, Mohit Bansal, Kathleen McKeown||7 views
πŸ€–AI Summary

Researchers introduce PoSh, a new evaluation metric for detailed image descriptions that uses scene graphs to guide LLMs-as-a-Judge, achieving better correlation with human judgments than existing methods. They also present DOCENT, a challenging benchmark dataset featuring artwork with expert-written descriptions to evaluate vision-language models' performance on complex image analysis.

Key Takeaways
  • β†’PoSh metric uses scene graphs as structured rubrics to guide LLM evaluation of detailed image descriptions, outperforming existing metrics including GPT-4o.
  • β†’DOCENT benchmark contains artwork paired with expert-written references and human quality judgments from art history students.
  • β†’PoSh achieves +0.05 higher Spearman correlation with human judgments compared to best open-weight alternatives.
  • β†’Foundation models struggle with error-free coverage of images with rich scene dynamics, revealing limitations in current VLM capabilities.
  • β†’The research enables advances in assistive text generation and establishes a demanding new task for measuring VLM progress.
Read Original β†’via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains β€” you keep full control of your keys.
Connect Wallet to AI β†’How it works
Related Articles