AIBearisharXiv โ CS AI ยท 4h ago5
๐ง
FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models
Researchers introduce FRIEDA, a new benchmark for testing cartographic reasoning in large vision-language models, revealing significant limitations. The best AI models achieve only 37-38% accuracy compared to 84.87% human performance on complex map interpretation tasks requiring multi-step spatial reasoning.