AINeutralarXiv – CS AI · 7h ago6/10
🧠
Metadata-Aware Multi-Prompt Reasoning for Zero-Shot Accident Understanding
Researchers present a three-stage pipeline for zero-shot accident detection in surveillance videos that combines temporal localization, semantic classification, and spatial grounding using vision-language models. The method decomposes accident understanding into when, what, and where components, achieving significant improvements over baseline approaches on the ACCIDENT benchmark.