RobustSora: De-Watermarked Benchmark for Robust AI-Generated Video Detection
Researchers introduce RobustSora, a benchmark dataset of 6,500 videos designed to isolate how AI-generated video detectors rely on watermarks versus actual generation artifacts. Testing across ten detection models reveals that watermark manipulation causes accuracy drops of up to 14 percentage points, demonstrating that current detectors are vulnerable to watermark-removal attacks and may not detect authentic AI-generated content when watermarks are absent.
The emergence of advanced AI video generation models like Sora, Pika, and others has created urgent demands for reliable detection mechanisms to combat deepfakes and misinformation. However, RobustSora exposes a critical vulnerability in existing detection approaches: most commercial AI video generators embed visible watermarks for provenance tracking, and current detection benchmarks fail to account for whether detectors identify genuine generation artifacts or simply pattern-match watermarks themselves.
This research addresses a fundamental confound in the AI detection landscape. By systematically testing detectors on videos with watermarks removed and authentic videos with fake watermarks injected, the authors provide causal evidence that detection models are substantially dependent on watermark cues rather than generation-specific features. The variance in impact across different generators—Sora 2 showing 11-14pp drops versus 3-6pp for Pika—suggests watermark prominence drives detector performance more than architectural differences.
For the content moderation and digital trust ecosystem, these findings highlight a critical gap in production-ready detection systems. If detectors fail when watermarks are removed through simple inpainting techniques, they cannot reliably identify AI-generated content in real-world scenarios where bad actors deliberately remove watermarks. This threatens the integrity of platforms relying on automated detection for content moderation.
The research offers practical solutions through watermark-aware training augmentation, which recovers 3-4pp in accuracy. Going forward, the AI detection community must prioritize watermark-agnostic feature learning and comprehensive evaluation protocols that account for adversarial watermark manipulation, similar to how cybersecurity evaluates defenses against evasion techniques.
- →Current AI video detectors rely heavily on watermarks rather than actual generation artifacts, making them vulnerable to watermark removal
- →Watermark manipulation causes accuracy drops of 6-14 percentage points across tested detection models, with statistical significance in 7 of 10 cases
- →Sora 2 shows the highest watermark dependency with 11-14pp accuracy loss, while Pika and Open-Sora 2 show lower dependency at 3-6pp
- →Watermark-aware training augmentation can recover 3-4 percentage points on both watermark erasure and spoofing tasks
- →The benchmark dataset and evaluation methodology establish a new standard for assessing detection robustness against watermark manipulation