y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#vlm-evaluation News & Analysis

2 articles tagged with #vlm-evaluation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBearisharXiv – CS AI · Apr 147/10
🧠

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

Researchers introduce HAERAE-Vision, a benchmark of 653 real-world underspecified visual questions from Korean online communities, revealing that state-of-the-art vision-language models achieve under 50% accuracy on natural queries despite performing well on structured benchmarks. The study demonstrates that query clarification alone improves performance by 8-22 points, highlighting a critical gap between current evaluation standards and real-world deployment requirements.

🧠 GPT-5🧠 Gemini
AINeutralarXiv – CS AI · Mar 36/103
🧠

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models

Researchers introduce OmniSpatial, a comprehensive benchmark for testing spatial reasoning capabilities in vision-language models (VLMs). The benchmark reveals significant limitations in both open and closed-source VLMs across four major spatial reasoning categories, with over 8,400 question-answer pairs testing advanced cognitive abilities.

$NEAR