Physical Adversarial Attacks on AI Surveillance Systems:Detection, Tracking, and Visible--Infrared Evasion
This research paper examines physical adversarial attacks on AI surveillance systems through a surveillance-oriented lens, emphasizing that robustness cannot be assessed from isolated image benchmarks alone. The study highlights critical gaps in current evaluation practices, including temporal persistence across frames, multi-modal sensing (visible and infrared), realistic attack carriers, and system-level objectives that must be tested under actual deployment constraints.
Physical adversarial attacks on AI systems have traditionally been evaluated in controlled laboratory settings with single-frame image benchmarks, creating a significant gap between academic research and real-world surveillance deployment. This paper reframes the discussion by identifying four critical dimensions absent from most current literature: temporal persistence (whether evasion holds across video frames), sensing modality (handling both RGB and thermal inputs simultaneously), carrier realism (how practical the attack mechanism is to execute), and system-level objectives (disrupting tracking, not just detection).
The shift reflects growing sophistication in adversarial research. Early work demonstrated that carefully crafted perturbations could fool object detectors in isolation. However, deployed surveillance systems maintain temporal identity through multi-object tracking algorithms, operate across multiple sensor types, and cannot be fooled by theoretical attacks requiring bulky or conspicuous equipment. A perturbation suppressing detection in a single frame proves useless if the system recovers identity in subsequent frames using tracking logic.
This research has profound implications for computer vision vendors, security system designers, and security researchers. Organizations deploying surveillance infrastructure must recognize that benchmark performance on static datasets does not guarantee robustness against coordinated physical attacks. The identified gaps—including distance robustness, camera-pipeline variation, and identity-level metrics—represent areas where vendor claims may be misleading.
Looking forward, the field must move toward surveillance-grade evaluation frameworks that test systems holistically across time, multiple sensor modalities, and realistic constraints. Organizations investing in surveillance AI should demand evaluations meeting these stricter standards rather than relying on traditional computer vision benchmarks.
- →Physical adversarial attacks on surveillance systems require evaluation across multiple dimensions: temporal persistence, multi-modal sensing, realistic carriers, and system-level objectives
- →Current per-frame benchmark evaluations fail to capture real-world robustness because deployed systems use multi-object tracking to maintain identity across video frames
- →Visible-infrared dual-modal evasion represents a critical gap, as many surveillance systems integrate both RGB and thermal sensing that academic research rarely evaluates together
- →Realistic attack carrier design fundamentally changes threat models, distinguishing conspicuous patches from wearable or dynamically activated mechanisms
- →Surveillance-grade evaluation frameworks must include distance robustness, camera-pipeline variation, and identity-level metrics rather than relying solely on detection-level performance