y0news
AnalyticsDigestsSourcesRSSAICrypto
#opencv1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 7h ago6/10
๐Ÿง 

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Researchers introduce VTC-Bench, a comprehensive benchmark for evaluating multimodal AI models' ability to use visual tools for complex tasks. The benchmark reveals significant limitations in current models, with leading model Gemini-3.0-Pro achieving only 51% accuracy on multi-tool visual reasoning tasks.

๐Ÿง  Gemini