Can You Trust What You See? Human and AI Detection of Synthetic Legal Evidence
Researchers evaluated humans and advanced AI models on detecting synthetic legal evidence, finding both groups unreliable authenticators. Human accuracy dropped to near-chance levels (48-51%) against leading image generators, while AI models achieved perfect specificity but missed most synthetic outputs, suggesting visual evidence requires multi-layered verification in legal proceedings.
The authentication crisis revealed in this research has profound implications for legal systems increasingly dependent on visual evidence. Researchers created SLED-1400, a dataset pairing 200 authentic photographs with 1,200 AI-generated counterparts across ten evidence categories, then tested both human judges and multimodal language models. The results expose a dangerous gap: humans perform only marginally better than guessing against sophisticated generators like Gemini-3-Pro-Image and Flux-2-Max, while leading MLLMs achieve perfect specificity but catastrophically low sensitivity, missing 94.1% of synthetic images from the hardest generators.
This research emerges as generative AI capabilities accelerate faster than detection methods, creating asymmetric risk in adversarial contexts. Legal systems have historically privileged photographic evidence as objective truth, yet this assumption collapses when sophisticated actors can produce indistinguishable fakes. The near-zero correlation between human and MLLM errors suggests fundamentally different failure modes—humans struggle with visual subtlety while models operate from statistical patterns without semantic understanding.
For the legal technology sector, this demands immediate procedural innovation. Blockchain-based provenance systems like C2PA Content Credentials become strategically valuable, shifting burden from detection to chain-of-custody verification. Courts cannot simply upgrade detection tools; they must implement layered authentication combining trained human review, AI screening, and cryptographic provenance. This creates market opportunities in evidence verification infrastructure while exposing liability risks for legal technology vendors relying on single authentication methods. Organizations handling visual evidence in litigation, insurance, and intellectual property disputes should expect heightened scrutiny and potential remediation costs as these findings gain legal recognition.
- →Humans achieve only 64.8% accuracy and near-chance performance (48-51%) against top-tier synthetic image generators in legal evidence detection.
- →Advanced AI models never misclassify authentic images but fail to detect 94.1% of synthetic outputs from hardest generators, making them unreliable standalone validators.
- →Human and AI detection errors show minimal correlation, indicating complementary failure modes rather than redundant weaknesses.
- →Legal systems require multi-layered authentication combining human review, AI screening, and cryptographic provenance infrastructure like C2PA Content Credentials.
- →Visual evidence authentication gaps create market opportunities and liability risks for legal technology providers and litigation participants.