结合图像块比较与残差图估计的人脸伪造检测

冯才博; 刘春晓; 王昱烨; 周其当

doi:10.11834/jig.230149

数字媒体深度伪造与对抗 | 浏览量 : 0 下载量: 7 CSCD: 0

PDF
导出
分享
收藏
专辑

结合图像块比较与残差图估计的人脸伪造检测
Face forgery detection with image patch comparison and residual map estimation
2024年29卷第2期页码：457-467
纸质出版日期： 2024-02-16 ，
DOI： 10.11834/jig.230149
稿件说明：

移动端阅览

冯才博，刘春晓，王昱烨，周其当. 2024. 结合图像块比较与残差图估计的人脸伪造检测. 中国图象图形学报， 29(02):0457-0467

Feng Caibo， Liu Chunxiao， Wang Yuye， Zhou Qidang. 2024. Face forgery detection with image patch comparison and residual map estimation. Journal of Image and Graphics， 29(02):0457-0467
冯才博，刘春晓，王昱烨，周其当. 2024. 结合图像块比较与残差图估计的人脸伪造检测. 中国图象图形学报， 29(02):0457-0467 DOI： 10.11834/jig.230149.

Feng Caibo， Liu Chunxiao， Wang Yuye， Zhou Qidang. 2024. Face forgery detection with image patch comparison and residual map estimation. Journal of Image and Graphics， 29(02):0457-0467 DOI： 10.11834/jig.230149.

摘要

目的

由于不同伪造类型样本的数据分布差距较大，现有人脸伪造检测方法的准确度不够高，而且泛化性能差。为此，本文引入“图像块归属纯净性”和“残差图估计可靠性”的概念，提出了基于图像块比较和残差图估计的人脸伪造检测方法。

方法

除了骨干网络，本文的人脸伪造检测神经网络主要由纯净图像块比较模块和可靠残差图估计模块两部分组成。为了避免在同时包含人脸和背景像素的图像块上提取的混杂特征对于图像块比较的干扰，纯净图像块比较模块中选择只包含人脸像素的纯净人脸图像块和只包含背景像素的纯净背景图像块，通过比较两种图像块纯净特征之间的差异来检测伪造图像，图像块的纯净性保障了特征提取的纯净性，从而提高了特征比较的鲁棒性。考虑到靠近伪造边缘的像素比远离伪造边缘的像素具有较高的残差估计准确度，本文在可靠残差图估计模块中根据像素到伪造边缘的距离设计了一个距离场加权的残差损失来引导网络的训练过程，使网络重点关注输入图像与对应真实图像在伪造边缘附近的差异，对于可靠信息的关注进一步增强了伪造检测的鲁棒性。

结果

在FF++（FaceForensics++）数据集上的测试结果显示：与对比算法中性能最好的F2Trans-B相比，本文方法的准确率和AUC（area under the ROC curve）指标分别提高了2.49%和3.31%，在FS（FaceSwap）与F2F（Face2Face）两种伪造数据上的准确率指标分别提高了6.01%和3.99%。在泛化性能方面，与11种已有方法在交叉数据集上的测试结果显示：本文方法与其中性能最好的方法相比，在CDF（Celeb-DF）数据集上的视频AUC指标和图像AUC指标分别提高了1.85%和1.03%。

结论

与对比方法相比，由于提高了特征信息的纯净性和可靠性，本文提出的人脸图像伪造检测模型的泛化能力和准确率优于对比方法。

Abstract

Objective

The face recognition technique has become a part of our daily lives in recent years. However， with the rapid development of face forgery techniques based on deep learning， the cost of face forgery has not only been considerably reduced， but unexpected risks to the face recognition technique have also been raised. If someone uses a fake face image to break into a face recognition system， our personal information and property will be compromise and may even be stolen. However， distinguishing whether the face in an image is forged is difficult for the human eyes. Moreover， existing face forgery detection methods exhibit poor generalization performance and are difficult to defend against unknown attack samples due to large data distribution gaps among different forgery samples. Therefore， a reliable and general face forgery detection method is urgently required. In this regard， we introduce the concepts of “patch attribution purity” and “residual estimation reliability”， and propose a novel multitask learning network （PuRe） based on pure image patch comparison （PIPC） and reliable residual map estimation （RRME） to detect face forgery images.

Method

Apart from the network backbone， our neural network consists of the PIPC module and the RRME module. Both modules are helpful for improving the performance of face forgery detection. On the one hand， if the face in an image is forged， then the features extracted from face and background patches should be inconsistent. The PIPC module compares feature discrepancy between face and background patches to complete the face forgery detection task. Nevertheless， if an image patch contains face and background pixels， then the features extracted from it will have mixed face and background information， disturbing the feature comparison between face and background image patches and resulting in the overfitting of the training dataset. Considering the aforementioned problem， our PIPC module suggests using only pure image patches， which only contain face pixels （pure face image patches） or background pixels （pure background image patches）. The purity of image patches guarantees the purity of the extracted features， and thus， the robustness of feature comparison is improved. On the other hand， the residual map estimation task is designed to predict the difference between the input image and the corresponding real image， causing the network backbone to strengthen the generalization of the extracted image features and improving the accuracy of face forgery detection. However， for pixels that are far from the forged edges between the forgery and real regions， the known information used to estimate the residuals will be less， resulting in unreliable residual estimation. Considering the aforementioned problem， a loss function， called the distance field weighted residual loss （DWRLoss）， is designed in the RRME module to compel the neural network to give more attention to estimating the residuals near the forged edges between the forgery and real regions. In the face region （i.e.， forgery region）， if the pixel is far from the background region， then its loss is assigned with a smaller weight coefficient. Attention to reliable residual information improves the robustness of face forgery detection. Finally， we adopt the multitask learning strategy to train the proposed neural network. Both learning tasks guide the network backbone together to extract effective and generalized features for face forgery detection.

Result

A large number of experiments are conducted to demonstrate the superiority of our method. Compared with existing superior methods， the test results on the FaceForensics++（FF++） dataset show that the accuracy （ACC） and area under the receiver operating characteristic curve （AUC） of face forgery detection are improved by 2.49% and 3.31%， respectively， by using the proposed method. Moreover， our method improves the ACC of face forgery detection on the FF++ dataset with FaceSwap（FS） and Face2Face（F2F） forgery types by 6.01% and 3.99%， respectively. In terms of the cross-dataset test， compared with 11 existing representative methods， the experimental results show that AUC on the Celeb-DF（CDF） dataset in the video and image levels is increased by 1.85% and 1.03%， respectively， with our method.

Conclusion

The proposed neural network （i.e.， PuRe） based on the PIPC and RRME modules exhibits excellent generalization ability and performs better than existing methods due to the purity and reliability of the extracted features.

关键词

人脸图像伪造检测深度伪造多任务学习泛化性能像素级监督卷积神经网络

Keywords

face forgery detectiondeepfakemulti-task learninggeneralizationpixel-wise supervisionconvolutional neural network

references

Afchar D， Nozick V， Yamagishi J and Echizen I. 2018. MesoNet： a compact facial video forgery detection network//Proceedings of 2018 IEEE International Workshop on Information Forensics and Security. Hong Kong， China： IEEE： 1-7 ［DOI： 10.1109/WIFS.2018.8630761http://dx.doi.org/10.1109/WIFS.2018.8630761］

Cao J Y， Ma C， Yao T P， Chen S， Ding S H and Yang X K. 2022. End-to-end reconstruction-classification learning for face forgery detection//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 4103-4112 ［DOI： 10.1109/CVPR52688.2022.00408http://dx.doi.org/10.1109/CVPR52688.2022.00408］

Cao S H， Liu X H， Mao X Q and Zou Q. 2022. A review of human face forgery and forgery-detection technologies. Journal of Image and Graphics， 27（4）： 1023-1038

曹申豪，刘晓辉，毛秀青，邹勤. 2022. 人脸伪造及检测技术综述. 中国图象图形学报， 27（4）： 1023-1038 ［DOI： 10.11834/jig.200466http://dx.doi.org/10.11834/jig.200466］

Cao S H， Zou Q， Mao X Q， Ye D P and Wang Z Y. 2021. Metric learning for anti-compression facial forgery detection//Proceedings of the 29th ACM International Conference on Multimedia. Virtual Event， China： ACM： 1929-1937 ［DOI： 10.1145/3474085.3475347http://dx.doi.org/10.1145/3474085.3475347］

Chen L， Zhang Y， Song Y B， Liu L Q and Wang J. 2022. Self-supervised learning of adversarial example： towards good generalizations for deepfake detection//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 18689-18698 ［DOI： 10.1109/CVPR52688.2022.01815http://dx.doi.org/10.1109/CVPR52688.2022.01815］

Chen S， Yao T P， Chen Y， Ding S H， Li J L and Ji R R. 2021a. Local relation learning for face forgery detection. Proceedings of the AAAI Conference on Artificial Intelligence， 35（2）： 1081-1088 ［DOI： 10.1609/aaai.v35i2.16193http://dx.doi.org/10.1609/aaai.v35i2.16193］

Chen Z H and Yang H. 2021. Attentive semantic exploring for manipulated face detection//Proceedings of 2021 IEEE International Conference on Acoustics， Speech and Signal Processing. Toronto， Canada： IEEE： 1985-1989 ［DOI： 10.1109/ICASSP39728.2021.9414225http://dx.doi.org/10.1109/ICASSP39728.2021.9414225］

Chen Z K， Xie L X， Pang S M， He Y and Zhang B. 2021b. MagDR： mask-guided detection and reconstruction for defending deepfakes//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 9010-9019 ［DOI： 10.1109/CVPR46437.2021.00890http://dx.doi.org/10.1109/CVPR46437.2021.00890］

Cozzolino D， Poggi G and Verdoliva L. 2017. Recasting residual-based local descriptors as convolutional neural networks： an application to image forgery detection//Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security. Pennsylvania， USA： ACM： 159-164 ［DOI： 10.1145/3082031.3083247http://dx.doi.org/10.1145/3082031.3083247］

Das S， Seferbekov S， Datta A， Islam M S and Amin M R. 2021. Towards solving the DeepFake problem： an analysis on improving DeepFake detection using dynamic face augmentation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision Workshops. Montreal， Canada： IEEE： 3769-3778 ［DOI： 10.1109/ICCVW54120.2021.00421http://dx.doi.org/10.1109/ICCVW54120.2021.00421］

Du M N， Pentyala S， Li Y N and Hu X. 2020. Towards generalizable deepfake detection with locality-aware autoencoder//Proceedings of the 29th ACM International Conference on Information and Knowledge Management. Virtual Event， Ireland： ACM： 325-334 ［DOI： 10.1145/3340531.3411892http://dx.doi.org/10.1145/3340531.3411892］

Dufour N and Gully A. 2019. Contributing data to deepfake detection research ［EB/OL］. ［2023-03-09］. https：//torontoai.org/2019/09/23/contributing-data-to-deepfake-detection-research/https://torontoai.org/2019/09/23/contributing-data-to-deepfake-detection-research/

Guo Z Q， Yang G B， Chen J Y and Sun X M. 2021. Fake face detection via adaptive manipulation traces extraction network. Computer Vision and Image Understanding， 204： #103170 ［DOI： 10.1016/j.cviu.2021.103170http://dx.doi.org/10.1016/j.cviu.2021.103170］

Karras T， Aila T， Laine S and Jaakko L. 2018. Progressive growing of GANs for improved quality， stability， and variation ［EB/OL］. ［2023-03-09］. https://arxiv.org/pdf/1710.10196.pdfhttps://arxiv.org/pdf/1710.10196.pdf

Kim M， Tariq S and Woo S S. 2021. FReTAL： generalizing deepfake detection using knowledge distillation and representation learning//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Nashville， USA： IEEE： 1001-1012 ［DOI： 10.1109/CVPRW53098.2021.00111http://dx.doi.org/10.1109/CVPRW53098.2021.00111］

Kowalski M. 2016. FaceSwap ［EB/OL］. ［2023-03-09］. https://github/marekkowalski/faceswaphttps://github/marekkowalski/faceswap

Li L Z， Bao J M， Zhang T， Yang H， Chen D， Wen F and Guo B N. 2020a. Face x-ray for more general face forgery detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 5000-5009 ［DOI： 10.1109/CVPR42600.2020.00505http://dx.doi.org/10.1109/CVPR42600.2020.00505］

Li Y Z and Lyu S W. 2019. Exposing deepfake videos by detecting face warping artifacts ［EB/OL］. ［2023-03-09］. https://arxiv.org/pdf/1811.00656.pdfhttps://arxiv.org/pdf/1811.00656.pdf

Li Y Z， Yang X， Sun P， Qi H G and Lyu S W. 2020b. Celeb-DF： a large-scale challenging dataset for deepfake forensics//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 3204-3213 ［DOI： 10.1109/CVPR42600.2020.00327http://dx.doi.org/10.1109/CVPR42600.2020.00327］

Liang J H， Shi H F and Deng W H. 2022. Exploring disentangled content information for face forgery detection//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv， Israel： Springer： 128-145 ［DOI： 10.1007/978-3-031-19781-9_8http://dx.doi.org/10.1007/978-3-031-19781-9_8］

Liu H G， Li X D， Zhou W B， Chen Y F， He Y， Xue H， Zhang W M and Yu N H. 2021. Spatial-phase shallow learning： rethinking face forgery detection in frequency domain//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 772-781 ［DOI： 10.1109/CVPR46437.2021.00083http://dx.doi.org/10.1109/CVPR46437.2021.00083］

Luo Y C， Zhang Y， Yan J C and Liu W. 2021 Generalizing face forgery detection with high-frequency features//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 16312-16321 ［DOI： 10.1109/CVPR46437.2021.01605http://dx.doi.org/10.1109/CVPR46437.2021.01605］

Miao C T， Tan Z C， Chu Q， Liu H， Hu H G and Yu N H. 2023. F2Trans： high-frequency fine-grained transformer for face forgery detection. IEEE Transactions on Information Forensics and Security， 18： 1039-1051 ［DOI： 10.1109/TIFS.2022.3233774http://dx.doi.org/10.1109/TIFS.2022.3233774］

Ni Y S， Meng D P， Yu C Q， Quan C B， Ren D C and Zhao Y J. 2022. CORE： consistent representation learning for face forgery detection//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 12-21 ［DOI： 10.1109/CVPRW56347.2022.00011http://dx.doi.org/10.1109/CVPRW56347.2022.00011］

Nirkin Y， Wolf L， Keller Y and Hassner T. 2022. DeepFake detection based on discrepancies between faces and their context. IEEE Transactions on Pattern Analysis and Machine Intelligence， 44（10）： 6111-6121 ［DOI： 10.1109/TPAMI.2021.3093446http://dx.doi.org/10.1109/TPAMI.2021.3093446］

Qian Y Y， Yin G J， Sheng L， Chen Z X and Shao J. 2020. Thinking in frequency： face forgery detection by mining frequency-aware clues//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 86-103 ［DOI： 10.1007/978-3-030-58610-2_6http://dx.doi.org/10.1007/978-3-030-58610-2_6］

Rössler A， Cozzolino D， Verdoliva L， Riess C， Thies J and Niessner M. 2019. FaceForensics++： learning to detect manipulated facial images//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 1-11 ［DOI： 10.1109/ICCV.2019.00009http://dx.doi.org/10.1109/ICCV.2019.00009］

Shiohara K and Yamasaki T. 2022. Detecting deepfakes with self-blended images//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 18699-18708 ［DOI： 10.1109/CVPR52688.2022.01816http://dx.doi.org/10.1109/CVPR52688.2022.01816］

Thies J， Zollhöfer M and Nießner M. 2019. Deferred neural rendering： image synthesis using neural textures. ACM Transactions on Graphics， 38（4）： #66 ［DOI： 10.1145/3306346.3323035http://dx.doi.org/10.1145/3306346.3323035］

Thies J， Zollhöfer M， Stamminger M， Theobalt C and Niessner M. 2016. Face2Face： real-time face capture and reenactment of RGB videos//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 2387-2395 ［DOI： 10.1109/CVPR.2016.262http://dx.doi.org/10.1109/CVPR.2016.262］

Tora M. 2019. Deepfakes ［EB/OL］. ［2023-03-09］. https://github.com/deepfakes/faceswaphttps://github.com/deepfakes/faceswap

Wang C R and Deng W H. 2021. Representative forgery mining for fake face detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 14918-14927 ［DOI： 10.1109/CVPR46437.2021.01468http://dx.doi.org/10.1109/CVPR46437.2021.01468］

Wang J， Sun Y L and Tang J H. 2022a. LiSiam： localization invariance siamese network for deepfake detection. IEEE Transactions on Information Forensics and Security， 17： 2425-2436 ［DOI： 10.1109/TIFS.2022.3186803http://dx.doi.org/10.1109/TIFS.2022.3186803］

Wang J K， Wu Z X， Ouyang W H， Han X T， Chen J J， Jiang Y G and Li S N. 2022c. M2TR： multi-modal multi-scale transformers for deepfake detection//Proceedings of 2022 International Conference on Multimedia Retrieval. Newark， USA： ACM： 615-623 ［DOI： 10.1145/3512527.3531415http://dx.doi.org/10.1145/3512527.3531415］

Wang Z， Guo Y W and Zuo W M. 2022b. Deepfake forensics via an adversarial game. IEEE Transactions on Image Processing， 31： 3541-3552 ［DOI： 10.1109/TIP.2022.3172845http://dx.doi.org/10.1109/TIP.2022.3172845］

Yang J C， Li A Y， Xiao S， Lu W and Gao X B. 2021. MTD-Net： learning to detect deepfakes images by multi-scale texture difference. IEEE Transactions on Information Forensics and Security， 16： 4234-4245 ［DOI： 10.1109/TIFS.2021.3102487http://dx.doi.org/10.1109/TIFS.2021.3102487］

Zhang B G， Li S， Feng G R， Qian Z X and Zhang X P. 2022. Patch diffusion： a general module for face manipulation detection. Proceedings of the AAAI Conference on Artificial Intelligence， 36（3）： 3243-3251 ［DOI： 10.1609/aaai.v36i3.20233http://dx.doi.org/10.1609/aaai.v36i3.20233］

Zhao H Q， Wei T Y， Zhou W B， Zhang W M， Chen D D and Yu N H. 2021b. Multi-attentional deepfake detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 2185-2194 ［DOI： 10.1109/CVPR46437.2021.00222http://dx.doi.org/10.1109/CVPR46437.2021.00222］

Zhao T C， Xu X， Xu M Z， Ding H， Xiong Y J and Xia W. 2021a. Learning self-consistency for deepfake detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Montreal， Canada： IEEE： 15003-15013 ［DOI： 10.1109/ICCV48922.2021.01475http://dx.doi.org/10.1109/ICCV48922.2021.01475］

Zhou P， Han X T， Morariu V I and Davis L S. 2017. Two-stream neural networks for tampered face detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu， USA： IEEE： 1831-1839 ［DOI： 10.1109/CVPRW.2017.229http://dx.doi.org/10.1109/CVPRW.2017.229］

文章被引用时，请邮件提醒。

提交