AIBearisharXiv โ CS AI ยท 14h ago7/10
๐ง
What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models
Researchers introduce HAERAE-Vision, a benchmark of 653 real-world underspecified visual questions from Korean online communities, revealing that state-of-the-art vision-language models achieve under 50% accuracy on natural queries despite performing well on structured benchmarks. The study demonstrates that query clarification alone improves performance by 8-22 points, highlighting a critical gap between current evaluation standards and real-world deployment requirements.
๐ง GPT-5๐ง Gemini