AINeutralarXiv โ CS AI ยท 7h ago6/10
๐ง
VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining
Researchers introduce VTC-Bench, a comprehensive benchmark for evaluating multimodal AI models' ability to use visual tools for complex tasks. The benchmark reveals significant limitations in current models, with leading model Gemini-3.0-Pro achieving only 51% accuracy on multi-tool visual reasoning tasks.
๐ง Gemini