多样性负实例生成的跨域人脸伪造检测
Negative instance generation for cross-domain facial forgery detection
- 2025年30卷第2期 页码:421-434
纸质出版日期: 2025-02-16
DOI: 10.11834/jig.240160
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2025-02-16 ,
移动端阅览
张晶, 许盼, 刘文君, 郭晓萱, 孙芳. 2025. 多样性负实例生成的跨域人脸伪造检测. 中国图象图形学报, 30(02):0421-0434
Zhang Jing, Xu Pan, Liu Wenjun, Guo Xiaoxuan, Sun Fang. 2025. Negative instance generation for cross-domain facial forgery detection. Journal of Image and Graphics, 30(02):0421-0434
目的
2
深度伪造检测(deepfake detection)通过训练复杂深度神经网络,挖掘更具辨别性的人脸图像表示,获得高精度的检测结果,其是一项确保人脸信息真实、可靠和安全的重要技术。然而,目前流行的模型存在过度依赖训练数据,使模型仅在相同域内表现出令人满意的检测性能,在跨领域场景中表现出较低泛化性,甚至使模型失效。因此,如何在有限的训练数据下实现跨域环境中的高效伪造人脸检测,成为亟待解决的问题。基于此,本文提出多样性负实例生成的跨域人脸伪造检测模型(negative instance generation-FFD, NIG-FFD)。
方法
2
首先,通过构建孪生自编码网络,获得标签一致的潜在多视图融合特征,引入对比约束提高难样本特征可判别性;其次,在高效训练的同时利用构造规则生成更具多样性的负实例融合特征,提高模型泛化性;最后,构建自适应重要性权值矩阵,避免因负实例生成导致类别分布不平衡使正类别样本欠学习。
结果
2
在两个流行的跨域数据集上验证本文模型的有效性,与其他先进方法相比,AUC(area under the receiver operating characteristic curve)值提升了10%。同时,在本域检测中ACC(accuracy score,)与AUC值相比其他方法均提升了近10%与5%。
结论
2
与对比方法相比,本文方法在跨域和本域的人脸伪造检测上都取得了优越的性能。本文所提的模型代码已开源至:
https://github.com/LNNU-computer-research-526/NIG-FFD
https://github.com/LNNU-computer-research-526/NIG-FFD
Objective
2
With the rapid development of multimedia, mobile internet, and artificial intelligence technologies, facial recognition has achieved tremendous success in areas such as identity verification and security monitoring. However, with its widespread application, the risk of facial forgery attacks is gradually increasing. These attacks leverage deep learning models to create fraudulent digital content, including images, videos, and audio, posing a potential threat to societal stability and national security. Therefore, achieving deepfake detection is crucial for maintaining individual and organizational interests, ensuring public safety, and promoting the sustainable development of innovative technologies. According to different modes of image representation, deepfake detection methods can generally be divided into two categories. First, methods based on traditional image feature description typically involve image processing and feature extraction based on signal transformation models. Second, methods based on deep learning strategies for forged facial detection employ complex deep neural networks to obtain more discriminative high-dimensional nonlinear facial feature descriptions, thereby improving forgery detection accuracy. Both of these methods have achieved satisfactory results in deepfake detection experiments. However, most training and testing samples for these models are collected from the same data domain, resulting in excellent performance under such conditions; subsequently, it becomes challenging to obtain testing samples that are consistent with the distribution of the original training samples in practical applications, which may limit the application of these models in free-scene forgery detection tasks and even lead to complete model failure. Therefore, some scholars have proposed a data augmentation framework based on structural feature mining to increase the performance of convolutional neural network detectors. However, when faces are seamlessly integrated with backgrounds at the pixel level, the recognition accuracy significantly decreases. Consequently, some scholars have utilized transformer network architectures to construct deep forgery detection frameworks. Although this model achieves satisfactory generalization by deeply understanding the manipulated regions, it lacks descriptions of local tampering representations, and its detection efficiency is also quite low. On this basis, the main challenges faced in constructing deepfake detection models in cross-domain scenarios can be summarized as follows: 1) extracting discriminative representations of forged facial images. The forgery process of facial images typically involves tampering or replacing local features of the image, posing challenges for obtaining discriminative features. 2) Improving the generalizability of detection models. Overreliance on current domain data during model training reduces the generalizability of recognition to other domain data, and when facing more challenging free-forgery detection scenarios, model failure may occur. This study addresses these challenges by introducing a cross-domain detection model that is based on diverse negative instance generations.
Method
2
The model achieves feature augmentation of forged negative instances and enhances the cross-domain recognition accuracy and generalizability by constructing a Siamese autoencoder network architecture with multiview feature fusion. It consists of the following three parts: 1) the model implements discriminative multiview feature fusion under contrastive constraints. First, a Siamese autoencoder network is constructed to extract different view features. Second, contrastive constraints are employed to achieve multiview feature fusion. Given that typical facial forgery image manipulation involves only small-scale replacements and tampering, the global features of forged facial images are remarkably similar to those of real faces. Contrastive loss enables the differentiation of weakly discriminative hard samples. It maximizes the similarity of intraclass features while minimizing the similarity of interclass features. Finally, comprehensive learning is facilitated by guiding the supervised feature extraction network to retain important feature information of the original input, an approach for emphasizing the learning of discriminative feature representations. This study proposes the use of reconstruction loss to constrain the feature network by computing the difference between the decoder output and the original input. 2) The model achieves diversity in negative instance feature augmentation to enhance model generalizability, ensuring satisfactory recognition performance on cross-domain datasets. First, the rules for generating the fused samples are defined. This study statistically visualizes the network output feature histograms of constructed samples with different labels via feature visualization, analyzes the statistical patterns of negative samples, and defines feature-level sample generation rules: except when both view features are from positive samples, all other combinations of feature samples are generated as negative samples. Second, diverse forged feature sets are constructed using selected samples to enable the network to learn more discriminative features. Finally, a global training sample set is obtained by connecting the original training samples and augmented samples. 3) The model implements a discriminator construction with importance sample weighting. When the abovementioned feature augmentation of negative instances is achieved, the number of original negative instances can be significantly increased. This study introduces an importance weighting mechanism to avoid model overfitting on negative samples and underfitting on positive samples. First, the matrix is initialized to set different weights for each class sample, allowing negative samples to be weighted according to their predicted probabilities while keeping positive samples unchanged, thereby approximately achieving class balance during the loss calculation. Through negative sample weighting, the model is guided to pay more attention to positive sample features and prevent the classification decision boundary from biasing toward negative samples. Second, the distance between the predicted probability distribution and the true probability distribution was measured via cross-entropy loss as the classification loss function to supervise the classification results. Finally, the total loss function for model training is obtained.
Result
2
Experiments were conducted on three publicly available datasets to verify the effectiveness of the proposed method in a cross-domain environment. The model was subsequently compared with other popular methods, namely, FaceForensics++ (FF++), Celeb-DFv2, and the Deepfake Detection Challenge, and the results were analyzed. The FF++ dataset comprises three versions based on different compression levels: c0 (original), c23 (high quality), and c40 (low quality). This study utilized the c23 and c40 versions for experimentation. The Celeb-DFv2 dataset is widely employed to test the models’ generalization capabilities, as its forged images lack obvious visual artifact characteristics of deepfake manipulation, posing significant challenges in generalization detection. In the experiments, 100 genuine videos and 100 forged videos were randomly selected, with one image extracted every 30 frames. For the DFDC dataset, 140 videos were randomly selected, with 20 frames extracted from each video for testing. According to the experimental results, the proposed model exhibited a 10% improvement in the area under the curve (AUC) of the receiver operating characteristics compared with other state-of-the-art methods. Additionally, the model’s detection results in the native domain environment were validated, showing an approximate 10% and 5% increase in the ACC (accuracy score) and AUC values, respectively, compared with those of the other methods.
Conclusion
2
The method proposed in this study achieves superior performance in both cross-domain and in-domain deepfake detection.
Afchar D , Nozick V , Yamagishi J and Echizen I . 2018 . MesoNet: a compact facial video forgery detection network // 2018 IEEE International Workshop on Information Forensics and Security (WIFS) . Hong Kong, China : IEEE: 1 - 7 [ DOI: 10.1109/WIFS.2018.8630761 http://dx.doi.org/10.1109/WIFS.2018.8630761 ]
Bayar B and Stamm M C . 2018 . Constrained convolutional neural networks: a new approach towards general purpose image manipulation detection . IEEE Transactions on Information Forensics and Security , 13 ( 11 ): 2691 - 2706 [ DOI: 10.1109/TIFS.2018.2825953 http://dx.doi.org/10.1109/TIFS.2018.2825953 ]
Cao J Y , Ma C , Yao T P , Chen S , Ding S H and Yang X K . 2022 . End-to-end reconstruction-classification learning for face forgery detection // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 4113 - 4122 [ DOI: 10.1109/CVPR52688.2022.00408 http://dx.doi.org/10.1109/CVPR52688.2022.00408 ]
Cao Y G , Chen J Z , Huang L Q , Huang T Q and Ye F . 2023 . Three-classification face manipulation detection using attention-based feature decomposition . Computers and Security , 125 : # 103024 [ DOI: 10.1016/j.cose.2022.103024 http://dx.doi.org/10.1016/j.cose.2022.103024 ]
Chen L , Zhang Y , Song Y B , Liu L Q and Wang J . 2022 . Self-supervised learning of adversarial example: towards good generalizations for deepfake detection // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 18689 - 18698 [ DOI: 10.1109/CVPR52688.2022.01815 http://dx.doi.org/10.1109/CVPR52688.2022.01815 ]
Chen S , Yao T P , Chen Y , Ding S H , Li J L and Ji R R . 2021 . Local relation learning for face forgery detection //Proceedings of the 35th AAAI Conference on Artificial Intelligence. [s.l.]: AAAI: 1081 - 1088 [ DOI: 10.1609/aaai.v35i2.16193 http://dx.doi.org/10.1609/aaai.v35i2.16193 ]
Chollet F . 2017 . Xception: deep learning with depthwise separable convolutions // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Honolulu, USA : IEEE: 1800 - 1807 [ DOI: 10.1109/CVPR.2017.195 http://dx.doi.org/10.1109/CVPR.2017.195 ]
Cozzolino D , Poggi G and Verdoliva L . 2017 . Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection // Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security . Philadelphia, USA : ACM: 159 - 164 [ DOI: 10.1145/3082031.3083247 http://dx.doi.org/10.1145/3082031.3083247 ]
Ding F , Kuang R S , Zhou Y , Sun L , Zhu X G and Zhu G P . 2024 . A survey of deepfake and related digital forensics . Journal of Image and Graphics , 29 ( 2 ): 295 - 317
丁峰 , 匡仁盛 , 周越 , 孙珑 , 朱小刚 , 朱国普 . 2024 . 深度伪造及其取证技术综述 . 中国图象图形学报 , 29 ( 2 ): 295 - 317 [ DOI: 10. 11834/jig. 230088 http://dx.doi.org/10.11834/jig.230088 ]
Dolhansky B , Howes R , Pflaum B , Baram N and Ferrer C C . 2019 . The deepfake detection challenge (DFDC) preview dataset [EB/OL]. [ 2024-03-27 ]. https://arxiv.org/pdf/1910.08854.pdf https://arxiv.org/pdf/1910.08854.pdf
Guo Y , Zhen C and Yan P F . 2023a . Controllable guide-space for generalizable face forgery detection // Proceedings of 2023 IEEE/CVF International Conference on Computer Vision . Paris, France : IEEE: 20761 - 20770 [ DOI: 10.1109/ICCV51070.2023.01903 http://dx.doi.org/10.1109/ICCV51070.2023.01903 ]
Guo Z Q , Yang G B , Wang D W and Zhang D Y . 2023b . A data augmentation framework by mining structured features for fake face image detection . Computer Vision and Image Understanding , 226 : # 103587 [ DOI: 10.1016/j.cviu.2022.103587 http://dx.doi.org/10.1016/j.cviu.2022.103587 ]
Khormali A and Yuan J S . 2024 . Self-supervised graph transformer for deepfake detection . IEEE Access , 12 : 58114 - 58127 [ DOI: 10.1109/ACCESS.2024.3392512 http://dx.doi.org/10.1109/ACCESS.2024.3392512 ]
Li L Z , Bao J M , Zhang T , Yang H , Chen D , Wen F and Guo B N . 2020a . Face X-ray for more general face forgery detection // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle, USA : IEEE: 5000 - 5009 [ DOI: 10.1109/CVPR42600.2020.00505 http://dx.doi.org/10.1109/CVPR42600.2020.00505 ]
Li Y Z , Yang X , Sun P , Qi H G and Lyu S . 2020b . Celeb-DF: a large-scale challenging dataset for deepfake forensics // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle, USA : IEEE: 3204 - 3213 [ DOI: 10.1109/CVPR42600.2020.00327 http://dx.doi.org/10.1109/CVPR42600.2020.00327 ]
Luo Y C , Zhang Y , Yan J C and Liu W . 2021 . Generalizing face forgery detection with high-frequency features // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville, USA : IEEE: 16312 - 16321 [ DOI: 10.1109/CVPR46437.2021.01605 http://dx.doi.org/10.1109/CVPR46437.2021.01605 ]
Qian Y Y , Yin G J , Sheng L , Chen Z X and Shao J . 2020 . Thinking in frequency: face forgery detection by mining frequency-aware clues // Proceedings of the 16th European Conference on Computer Vision . Glasgow, UK : Springer: 86 - 103 [ DOI: 10.1007/978-3-030-58610-2_6 http://dx.doi.org/10.1007/978-3-030-58610-2_6 ]
Rössler A , Cozzolino D , Verdoliva L , Riess C , Thies J and Nießner M . 2019 . FaceForensics++: learning to detect manipulated facial images // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul, Korea (South) : IEEE: 1 - 11 [ DOI: 10.1109/ICCV.2019.00009 http://dx.doi.org/10.1109/ICCV.2019.00009 ]
Sagonas C , Antonakos E , Tzimiropoulos G , Zafeiriou S and Pantic M . 2016 . 300 Faces in-the-wild challenge: database and results . Image and Vision Computing , 47 : 3 - 18 [ DOI: 10.1016/j.imavis.2016.01.002 http://dx.doi.org/10.1016/j.imavis.2016.01.002 ]
Sun K , Liu H , Ye Q X , Gao Y , Liu J Z , Shao L and Ji R R . 2021 . Domain general face forgery detection by learning to weight //Proceedings of the 35th AAAI Conference on Artificial Intelligence. [s.l.]: AAAI: 2638 - 2646 [ DOI: 10.1609/aaai.v35i3.16367 http://dx.doi.org/10.1609/aaai.v35i3.16367 ]
Wang C R and Deng W H . 2021 . Representative forgery mining for fake face detection // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville, USA : IEEE: 14918 - 14927 [ DOI: 10.1109/CVPR46437.2021.01468 http://dx.doi.org/10.1109/CVPR46437.2021.01468 ]
Wang T Y and Chow K P . 2023 . Noise based deepfake detection via multi-head relative-interaction // Proceedings of the 37th AAAI Conference on Artificial Intelligence . Washington, USA : AAAI: 14548 - 14556 [ DOI: 10.1609/aaai.v37i12.26701 http://dx.doi.org/10.1609/aaai.v37i12.26701 ]
Yang M X , Li Y F , Hu P , Bai J F , Lyu J C and Peng X . 2023 . Robust multi-view clustering with incomplete information . IEEE Transactions on Pattern Analysis and Machine Intelligence , 45 ( 1 ): 1055 - 1069 [ DOI: 10.1109/TPAMI.2022.3155499 http://dx.doi.org/10.1109/TPAMI.2022.3155499 ]
Yang Y J , Qin H C , Zhou H , Wang C C , Guo T Y , Han K and Wang Y H . 2024 . A robust audio deepfake detection system via multi-view feature // Proceedings of 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Seoul, Korea(South) : IEEE: 13131 - 13135 [ DOI: 10.1109/ICASSP48485.2024.10446560 http://dx.doi.org/10.1109/ICASSP48485.2024.10446560 ]
Yu C E , Zhang X H , Duan Y X , Yan S B , Wang Z H , Xiang Y , Ji S L and Chen W Z . 2024 . Diff-ID: an explainable identity difference quantification framework for deepfake detection . IEEE Transactions on Dependable and Secure Computing , 21 ( 5 ): 5029 - 5045 [ DOI: 10.1109/TDSC.2024.3364679 http://dx.doi.org/10.1109/TDSC.2024.3364679 ]
Zhang X , Karaman S and Chang S F . 2019 . Detecting and simulating artifacts in GAN fake images // 2019 IEEE International Workshop on Information Forensics and Security (WIFS) . Delft, Netherlands : IEEE: 1 - 6 [ DOI: 10.1109/WIFS47025.2019.9035107 http://dx.doi.org/10.1109/WIFS47025.2019.9035107 ]
Zhao H Q , Wei T Y , Zhou W B , Zhang W M , Chen D D and Yu N H . 2021 . Multi-attentional deepfake detection // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville, USA : IEEE: 2185 - 2194 [ DOI: 10.1109/CVPR46437.2021.00222 http://dx.doi.org/10.1109/CVPR46437.2021.00222 ]
Zhuang W Y , Chu Q , Tan Z T , Liu Q K , Yuan H J , Miao C T , Luo Z X and Yu N H . 2022 . UIA-ViT: unsupervised inconsistency-aware method based on vision transformer for face forgery detection // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv, Israel : Springer: 391 - 407 [ DOI: 10.1007/978-3-031-20065-6_23 http://dx.doi.org/10.1007/978-3-031-20065-6_23 ]
相关作者
相关机构