AINeutralarXiv – CS AI · 18h ago6/10
🧠
Sci-Rho: A Multilingual Visually-Grounded Symbolic Benchmark for STEM Problems
Researchers introduce Sci-Rho, a multilingual benchmark comprising 42,420 visually-grounded STEM problem instances across seven languages designed to test the robustness of vision-language models. The study reveals significant gaps between average and worst-case accuracy, with smaller models showing greater performance degradation across languages while larger proprietary models demonstrate better robustness.