y0news
← Feed
Back to feed
🧠 AI NeutralImportance 7/10

CityLens: Evaluating Large Vision-Language Models for Urban Socioeconomic Sensing

arXiv – CS AI|Tianhui Liu, Hetian Pang, Xin Zhang, Tianjian Ouyang, Zhiyuan Zhang, Jie Feng, Yong Li, Pan Hui||3 views
🤖AI Summary

Researchers introduced CityLens, a comprehensive benchmark for evaluating Large Vision-Language Models' ability to predict socioeconomic indicators from urban imagery. The study tested 17 state-of-the-art LVLMs across 11 prediction tasks using data from 17 global cities, revealing promising capabilities but significant limitations in urban socioeconomic analysis.

Key Takeaways
  • CityLens is the most extensive socioeconomic benchmark to date, covering 17 cities across 6 key urban domains including economy, education, crime, transport, health, and environment.
  • The benchmark evaluates 17 state-of-the-art Large Vision-Language Models using satellite and street view imagery across 11 prediction tasks.
  • Three evaluation paradigms were used: Direct Metric Prediction, Normalized Metric Estimation, and Feature-Based Regression.
  • Results show LVLMs have promising perceptual and reasoning capabilities but still exhibit significant limitations in predicting urban socioeconomic indicators.
  • The framework provides a unified approach for diagnosing LVLM limitations and guiding future urban analysis applications.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles