CoCoVideo: The High-Quality Commercial-Model-Based Contrastive Benchmark for AI-Generated Video Detection
Researchers introduce CoCoVideo-26K, a new dataset and detection framework for identifying AI-generated videos from commercial systems like those used by major AIGC providers. The work addresses a critical gap in deepfake detection by using high-quality synthetic videos from 13 commercial generators and proposes CoCoDetect, a hybrid approach combining contrastive learning with multimodal AI reasoning to improve detection accuracy.
The emergence of CoCoVideo-26K represents a significant advancement in addressing one of AI's most pressing challenges: detecting sophisticated synthetic media. As commercial video generation tools become increasingly accessible and realistic, the ability to identify manipulated content grows more critical for content authenticity, misinformation prevention, and digital forensics. This research directly tackles a documented weakness in existing detection systems—most prior datasets rely on lower-quality open-source generators, leaving models unprepared for real-world commercial AIGC systems that produce visually indistinguishable content.
The dataset's design incorporating semantically aligned real-fake pairs and watermark-free samples addresses practical limitations that have hindered model generalization. By spanning 13 mainstream commercial generators, CoCoVideo-26K captures the diversity of modern synthetic video creation, establishing a more realistic evaluation benchmark. The proposed CoCoDetect framework demonstrates sophisticated architectural thinking: using R3D-18 for spatio-temporal feature extraction while implementing a confidence gate to route uncertain cases to multimodal large language models for physical plausibility reasoning reflects a hybrid human-AI verification approach.
For stakeholders in content moderation, digital forensics, and platform security, this work provides actionable tools for combating deepfakes at scale. The open-sourcing of code and dataset accelerates industry-wide adoption of improved detection methods. However, this advancement also highlights an ongoing technological arms race—as detection improves, generation techniques simultaneously evolve, requiring continuous dataset updates and model refinement to maintain effectiveness against emerging commercial AIGC systems.
- →CoCoVideo-26K introduces the first large-scale dataset specifically designed around high-quality commercial video generators rather than open-source alternatives.
- →The hybrid CoCoDetect framework combines contrastive learning with multimodal reasoning to achieve state-of-the-art deepfake detection performance.
- →Existing detection models struggle with commercial AIGC because training datasets rely on lower-quality synthetic videos that don't represent real-world threats.
- →Open availability of dataset and code enables faster industry adoption of improved detection methods across content platforms and forensic applications.
- →The work demonstrates the critical need for continuous dataset updates as video generation quality and commercial AIGC capabilities advance rapidly.