AINeutralarXiv – CS AI · 6h ago6/10
🧠
Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation
Researchers introduce EngVQA, a benchmark for evaluating Vision-Language Models' engineering reasoning capabilities across 696 problems spanning five engineering subjects. The study reveals significant limitations in current VLMs' ability to perform multi-step technical reasoning while maintaining physical consistency, despite their strong performance on general multimodal tasks.