DecepTIV: A Large-Scale Benchmark for Robust Detection of T2V and I2V Synthetic Videos

Seminar series

Date

Tue, 03 Jun 2025
12:00

Speaker

Sotirios Stamnas, University of Warwick

The latest advances of generative AI have enabled the creation of synthetic media that are indistinguishable from authentic content. To counteract this, the research community has developed a great number of detectors targeting face-centric deepfake manipulations such as face-swapping, face-reenactment, face editing, and entire face synthesis. However, the detection of the most recent type of synthetic videos, Text-To-Video (T2V) and Image-To-Video (I2V), remains significantly under-researched, largely due to the lack of reliable open-source detection datasets. To address this gap, we introduce DecepTIV, a large-scale fake video detection dataset containing thousands of videos generated by the latest T2V and I2V models. To ensure real-world relevance, DecepTIV features diverse, realistic-looking scenes in contexts where misinformation could pose societal risks. We also include perturbed versions of the videos using common augmentations and distractors, to evaluate detector robustness under typical real-world degradations. In addition, we propose a modular generation pipeline that supports the seamless extension of the dataset with future T2V and I2V models. The pipeline generates synthetic videos conditioned on real video content, which ensures content similarity between real and fake examples. Our findings show that such content similarity is essential for training robust detectors, as models may otherwise overfit to scene semantics rather than learning generalizable forensic artifacts.