CityTrajBench: A Unified Benchmark for City-Scale Vehicle Trajectory Generation
Researchers introduce CityTrajBench, a unified benchmark framework for evaluating vehicle trajectory generation models across urban environments. The framework standardizes datasets, preprocessing, and evaluation metrics to enable fair comparison of statistical, VAE, GAN, diffusion, and flow-matching models, revealing that no single approach dominates all quality criteria.
CityTrajBench addresses a critical fragmentation problem in urban mobility research where inconsistent experimental methodologies obscure whether performance differences stem from algorithmic innovations or experimental setup variations. By establishing standardized protocols for data ingestion, normalization, and evaluation across heterogeneous model architectures, the benchmark enables researchers to isolate genuine technical advances from methodological artifacts.
The research builds on growing recognition that transportation simulation requires reproducible evaluation frameworks. Urban trajectory generation directly impacts autonomous vehicle development, traffic forecasting, and city planning applications where synthetic mobility data must accurately reflect real-world patterns. Existing research lacked common ground for comparison, hindering progress toward production-grade solutions.
The benchmark's findings reveal important trade-offs: diffusion-based models (DiffTraj, DiffRNTraj) excel at local geometric fidelity and structure preservation, while flow-matching approaches (TrajFlow) balance global realism with computational efficiency. Notably, simple Markov baselines remain competitive on aggregate statistics, suggesting that problem-specific attributes matter more than model complexity for certain applications. This challenges assumptions that sophisticated generative models uniformly outperform simpler alternatives.
For the AI research community, CityTrajBench establishes infrastructure similar to COCO for computer vision or SuperGLUE for NLP—enabling standardized progress measurement. For transportation technology developers and urban planners deploying synthetic mobility data, the framework guides model selection based on specific use-case requirements rather than headline metrics. Future work should extend benchmarking to multi-agent interactions, weather variations, and long-horizon consistency challenges in urban environments.
- →CityTrajBench standardizes evaluation across five model families, eliminating experimental inconsistencies that previously obscured true performance differences
- →Diffusion-based and flow-matching models show complementary strengths, with no single approach dominating trajectory generation across all quality dimensions
- →Simple Markov baselines remain competitive on trip-level statistics, demonstrating that model sophistication should match specific application requirements
- →The benchmark supports reproducible research in urban mobility, similar to standardized frameworks in computer vision and NLP
- →Multi-objective nature of trajectory generation demands application-specific model selection rather than universal optimization