边缘引导的双注意力图像拼接检测网络

吴晶辉; 严彩萍; 李红; 刘仁海

doi:10.11834/jig.230103

数字媒体深度伪造与对抗 | 浏览量 : 0 下载量: 5 CSCD: 0

PDF
导出
分享
收藏
专辑

边缘引导的双注意力图像拼接检测网络
BDA-Net： boundary-guided dual attention network for image splicing detection
2024年29卷第2期页码：430-443
纸质出版日期： 2024-02-16 ，
DOI： 10.11834/jig.230103
稿件说明：

移动端阅览

吴晶辉，严彩萍，李红，刘仁海. 2024. 边缘引导的双注意力图像拼接检测网络. 中国图象图形学报， 29(02):0430-0443

Wu Jinghui， Yan Caiping， Li Hong， Liu Renhai. 2024. BDA-Net： boundary-guided dual attention network for image splicing detection. Journal of Image and Graphics， 29(02):0430-0443
吴晶辉，严彩萍，李红，刘仁海. 2024. 边缘引导的双注意力图像拼接检测网络. 中国图象图形学报， 29(02):0430-0443 DOI： 10.11834/jig.230103.

Wu Jinghui， Yan Caiping， Li Hong， Liu Renhai. 2024. BDA-Net： boundary-guided dual attention network for image splicing detection. Journal of Image and Graphics， 29(02):0430-0443 DOI： 10.11834/jig.230103.

摘要

目的

伪造图像给众多行业埋下了隐患，这会造成大量潜在的经济损失。

方法

提出一种边缘引导的双注意力图像拼接检测网络（boundary-guided dual attention network，BDA-Net），该网络通过将空间通道依赖和边缘预测集成到网络提取的特征中来得到预测结果。首先，提出一种称为预测分支的编解码模型，该分支作为模型的主干网络，可以提取和融合不同分辨率的特征图。其次，为了捕捉不同维度的依赖关系并增强网络对感兴趣区域的关注能力，设计了一个沿多维度进行特征编码的坐标—空间注意力模块（coordinate-spatial attention module，CSAM）。最后，设计了一条边缘引导分支来捕获篡改区域和非篡改区域之间的微小边缘痕迹，以辅助预测分支进行更好的分割。

结果

实验使用4个图像拼接数据集与多种方法进行比较，评价指标为F1值。在Columbia数据集中，与排名第1的模型相比，F1值仅相差1.6%。在NIST16 Splicing（National Institute of Standards and Technology 16 Splicing）数据集中，F1值与最好的模型略有差距。而在检测难度更高的CASIA2.0 Splicing（Chinese Academy of Sciences Institute of Automation Dataset 2.0 Splicing）和IMD2020（Image Manipulated Datasets 2020）数据集中，BDA-Net 的F1值相比排名第2的模型分别提高了15.3%和11.9%。为了验证模型的鲁棒性，还对图像施加JPEG压缩、高斯模糊、锐化、高斯噪声和椒盐噪声攻击。实验结果表明，BDA-Net的鲁棒性明显优于其他模型。

结论

本文方法充分利用深度学习模型的优点和图像拼接检测领域的专业知识，能有效提升模型性能。与现有的检测方法相比，具有更强的检测能力和更好的稳定性。

Abstract

Objective

The rapid development of the internet and the proliferation of effective and user-friendly picture editing software have resulted in an explosion of modified images on the internet. Although these modified images can bring some benefits （e.g.， landscape beautification and face photo enhancement）， they also have many negative effects on people

https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=53832518&type=

https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=53832515&type=

1.60866666

2.37066650

s lives， such as falsified transaction records， published false news and fake evidence in court. Maliciously exploited tampered images can cause immeasurable damage to individuals and society. Recent studies on image splicing detection have demonstrated the effectiveness of convolutional neural networks in improving localization performance. However， they have generally ignored the multiscale information fusion， which is essential for locating tampered regions of various sizes. Moreover， the performance of most existing detection methods is unsatisfactory. Therefore， we need to design a good splicing image detection method.

Method

In this study， we propose a novel boundary-guided dual attention network （BDA-Net） by integrating spatial-channel dependency and boundary prediction into the features extracted by the network. In particular， we present a new encoder-decoder model named prediction branch to extract and fuse feature maps with different resolutions. This model constitutes the backbone of BDA-Net. A coordinate-spatial attention module（CSAM） is designed and embedded into the deep layer of feature extraction to capture long-range dependencies. In this way， the representations of interested regions can be augmented. Moreover， the computational complexity is limited by aggregating features with three one-dimensional encodings. In addition， we present a boundary-guided branch to capture the tiny border artifacts between tampered and non-tampered regions and it is modeled as a binary segmentation task to enhance the detailed prediction of our network. A multitask loss function is designed to constrain the network. The loss function consists of two parts， one is the pixel level localization loss function， the other is the boundary loss function. The localization loss function is composed of weighted cross-entropy loss function and Dice loss function. In the tampered image， the proportion of tampered area and non-tampered area is not the same. The proportion of the tampered region is smaller than that of the non-tampered region， which will cause the problem of sample imbalance. The weighted cross-entropy loss function can set different weights for different training samples and improve the model’s focus to the training samples with high weights. The Dice loss function pays attention to the pixel-level similarity between the predicted results and the real results. In the case of class imbalance， the weight value can be adjusted adaptively to improve the accuracy and robustness of the segmentation model. The boundary loss function is composed of Dice loss function. Boundary label are used to guide the network to predict the splicing boundary of a tampered image. In the boundary label， the number of boundary pixels is much smaller than the number of non-boundary pixels， which can lead to an imbalance of class. This phenomenon is especially evident in high-resolution images. Therefore， using the Dice loss function as boundary loss function is very helpful for model to learn features from extremely unbalanced data.The network is implemented in the PyTorch 2.0 framework. The input images and ground-truth maps are resized to 500 × 500 pixels for training. At the same time， adam optimization algorithm is used to optimize the model. The initial learning rate of the model is set to 1E-4， and the learning rate scheduler is the Cosine Annealing WarmRestarts learning rate scheduler. Batch size is set to 2.

Result

We use four image splicing datasets in our experiments： Columbia dataset， NIST16 splicing dataset （National Institute of Standards and Technology 16 Splicing）， CASIA2.0 splicing dataset （Chinese Academy of Sciences Institute of Automation Dataset 2.0 Splicing） and IMD2020 dataset （Image Manipulated Datasets 2020）. All of the spliced images in the Columbia dataset were created using real images， without any post-processing， with high resolution and uncompressed. The NIST16 dataset is a very challenging provided by the National Institute of Standards and Technology. CASIA 2.0 dataset is a popular image tamper detection dataset with rich and clear image content. The IMD2020 dataset contains 2 010 real images downloaded from the internet and corresponding labels. We choose four detection methods based on deep learning to compare the performance of the proposed BDA-Net. They are U-Net， DeepLab V3+ （deep lab V3+）， RRU-Net （ringed residual U-Net） and MTSE-Net （multi-task SE-network）. U-Net is a classical semantic segmentation model， which can be applied to many tasks. DeepLab V3+ combines the spatial pyramid pool module with the encoder-decoder structure to obtain a semantic segmentation model that can encode multi-scale context information and capture clear target edges. RRU-Net is a ring residual network based on U-Net， which carries out feature reinforcement through the propagation and feedback process of residual in convolutional neural network（CNN）， which makes the difference between tampered region and non-tampered region more obvious. MTSE-Net is a two-branch model， which realizes tamper detection by fusing the information features of the two branches.The quantitative evaluation metric is the F1 measure. F1 is a commonly used classification model evaluation index. In the Columbia dataset， the F1 values of the proposed BDA-Net and the top-ranked model differ by only 1.6%. In the NIST16 Splicing dataset， the F1 value of the proposed BDA-Net differs slightly from the F1 values of the best models. In difficult datasets， namely， the CASIA2.0 splicing dataset and the IMD2020 dataset， the F1 values of BDA-Net are 15.3% and 11.9% higher than those of the second-ranked model， respectively. Moreover， we apply five complex attack methods， namely， JPEG compression， Gaussian blur， sharpening， Gaussian noise and salt and pepper noise， to the image to verify the robustness of our proposed model. Experiments show that the robustness of our model is significantly better than that of the other models.

Conclusion

The image splicing detection method proposed in this study fully uses the advantages of the deep learning model and the expertise in the image forgery field， effectively improving the model’s performance. The experimental results on four splicing datasets illustrate that our model has stronger detection capability and better stability than the existing splicing detection methods.

关键词

图像取证图像篡改检测卷积神经网络（CNN）注意力机制融合算法

Keywords

image forensicsimage splicing detectionconvolutional neural network（CNN）attention mechanismfusion algorithm

references

Bappy J H， Roy-Chowdhury A K， Bunk J， Nataraj L and Manjunath B S. 2017. Exploiting spatial structure for localizing manipulated image regions//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 4980-4989 ［DOI： 10.1109/ICCV.2017.532http://dx.doi.org/10.1109/ICCV.2017.532］

Bappy J H， Simons C， Nataraj L， Manjunath B S and Roy-Chowdhury A K. 2019. Hybrid LSTM and encoder-decoder architecture for detection of image forgeries. IEEE Transactions on Image Processing， 28（7）： 3286-3300 ［ DOI： 10.1109/TIP.2019.2895466］

Bayar B and Stamm M C. 2018. Constrained convolutional neural networks： a new approach towards general purpose image manipulation detection. IEEE Transactions on Information Forensics and Security， 13（11）： 2691-2706 ［DOI： 10.1109/TIFS.2018.2825953http://dx.doi.org/10.1109/TIFS.2018.2825953］

Bi X L， Wei Y， Xiao B and Li W S. 2019. RRU-Net： the ringed residual U-Net for image splicing forgery detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Long Beach， USA： IEEE： 30-39 ［DOI： 10.1109/CVPRW.2019.00010http://dx.doi.org/10.1109/CVPRW.2019.00010］

Bianchi T and Piva A. 2012. Image forgery localization via block-grained analysis of JPEG artifacts. IEEE Transactions on Information Forensics and Security， 7（3）： 1003-1017 ［DOI： 10.1109/TIFS.2012.2187516http://dx.doi.org/10.1109/TIFS.2012.2187516］

Bochkovskiy A， Wang C Y and Liao H Y M. 2020. YOLOV4： optimal speed and accuracy of object detection ［EB/OL］. ［2022-11-10］. https://arxiv.org/pdf/2004.10934.pdfhttps://arxiv.org/pdf/2004.10934.pdf.

Canny J. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence， PAMI-8（6）： 679-698 ［DOI： 10.1109/TPAMI.1986.4767851http://dx.doi.org/10.1109/TPAMI.1986.4767851］

Chen L C， Zhu Y K， Papandreou G， Schroff F and Adam H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 833-851 ［DOI： 10.1007/978-3-030-01234-2_49http://dx.doi.org/10.1007/978-3-030-01234-2_49］

Dong J， Wang W and Tan T N. 2013. CASIA image tampering detection evaluation database//Proceedings of 2013 IEEE China Summit and International Conference on Signal and Information Processing. Beijing， China： IEEE： 422-426 ［DOI： 10.1109/ChinaSIP.2013.6625374http://dx.doi.org/10.1109/ChinaSIP.2013.6625374］

Ferrara P， Bianchi T， De Rosa A and Piva A. 2012. Image forgery localization via fine-grained analysis of CFA artifacts. IEEE Transactions on Information Forensics and Security， 7（5）： 1566-1577 ［DOI： 10.1109/TIFS.2012.2202227http://dx.doi.org/10.1109/TIFS.2012.2202227］

Fridrich J and Kodovsky J. 2012. Rich models for steganalysis of digital images. IEEE Transactions on information Forensics and Security， 7（3）： 868-882 ［DOI： 10.1109/TIFS.2012.2190402http://dx.doi.org/10.1109/TIFS.2012.2190402］

Goljan M and Fridrich J. 2015. CFA-aware features for steganalysis of color images//Proceedings of SPIE 9409， Media Watermarking， Security， and Forensics 2015. San Francisco， USA： SPIE： 279-291 ［DOI： 10.1117/12.2078399http://dx.doi.org/10.1117/12.2078399］

Guan H Y， Kozak M， Robertson E， Lee Y， Yates A N， Delgado A， Zhou D L， Kheyrkhah T， Smith J and Fiscus J. 2019. MFC datasets： large-scale benchmark datasets for media forensic challenge evaluation//Proceedings of 2019 IEEE Winter Applications of Computer Vision Workshops. Waikoloa， USA： IEEE： 63-72 ［DOI： 10.1109/WACVW.2019.00018http://dx.doi.org/10.1109/WACVW.2019.00018］

Hochreiter S and Schmidhuber J. 1997. Long short-term memory. Neural Computation， 9（8）： 1735-1780 ［DOI： 10.1162/neco.1997.9.8.1735http://dx.doi.org/10.1162/neco.1997.9.8.1735］

Hou Q B， Zhou D Q and Feng J S. 2021. Coordinate attention for efficient mobile network design//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 13708-13717 ［DOI： 10.1109/CVPR46437.2021.01350http://dx.doi.org/10.1109/CVPR46437.2021.01350］

Hu J， Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer vIsion and Pattern Recognition. Salt Lake City， USA： IEEE： 7132-7141 ［DOI： 10.1109/CVPR.2018.0074http://dx.doi.org/10.1109/CVPR.2018.0074］

Hu X F， Zhang Z H， Jiang Z Y， Chaudhuri S， Yang Z H and Nevatia R. 2020. SPAN： spatial pyramid attention network for image manipulation localization//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 312-328 ［DOI： 10.1007/978-3-030-58589-1_19http://dx.doi.org/10.1007/978-3-030-58589-1_19］

Kwon M J， Yu I J， Nam S H and Lee H K. 2021. CAT-Net： compression artifact tracing network for detection and localization of image splicing//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa， USA： IEEE： 375-384 ［DOI： 10.1109/WACV48630.2021.00042http://dx.doi.org/10.1109/WACV48630.2021.00042］

Li C T and Li Y. 2012. Color-decoupled photo response non-uniformity for digital image forensics. IEEE Transactions on Circuits and Systems for Video Technology， 22（2）： 260-271 ［DOI： 10.1109/TCSVT.2011.2160750http://dx.doi.org/10.1109/TCSVT.2011.2160750］

Li X Y， Ye Z H， Wei S K， Chen Z， Chen X T， Tian Y H， Dang J W， Fu S J and Zhao Y. 2023. 3D object detection for autonomous driving from image： a survey—benchmarks， constraints and error analysis. Journal of Image and Graphics， 28（6）： 1709-1740

李熙莹，叶芝桧，韦世奎，陈泽，陈小彤，田永鸿，党建武，付树军，赵耀. 2023. 基于图像的自动驾驶3D目标检测综述——基准、制约因素和误差分析. 中国图象图形学报， 28（6）： 1709-1740 ［DOI： 10.11834/jig.230036http://dx.doi.org/10.11834/jig.230036］

Li Y D， Guo K， Lu Y G and Liu L. 2021. Cropping and attention based approach for masked face recognition. Applied Intelligence， 51（5）： 3012-3025 ［DOI： 10.1007/s10489-020-02100-9http://dx.doi.org/10.1007/s10489-020-02100-9］

Lin Z C， He J F， Tang X O and Tang C K. 2009. Fast， automatic and fine-grained tampered JPEG image detection via DCT coefficient analysis. Pattern Recognition， 42（11）： 2492-2501 ［DOI： 10.1016/j.patcog.2009.03.019http://dx.doi.org/10.1016/j.patcog.2009.03.019］

Lyu S， Pan X Y and Zhang X. 2014. Exposing region splicing forgeries with blind local noise estimation. International Journal of Computer Vision， 110（2）： 202-221 ［DOI： 10.1007/s11263-013-0688-yhttp://dx.doi.org/10.1007/s11263-013-0688-y］

Mahdian B and Saic S. 2009. Using noise inconsistencies for blind image forensics. Image and Vision Computing， 27（10）： 1497-1503 ［DOI： 10.1016/j.imavis.2009.02.001http://dx.doi.org/10.1016/j.imavis.2009.02.001］

Mei Y Q， Fan Y C and Zhou Y Q. 2021. Image super-resolution with non-local sparse attention//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 3516-3525 ［DOI： 10.1109/CVPR46437.2021.00352http://dx.doi.org/10.1109/CVPR46437.2021.00352］

Newell A， Yang K Y and Deng J. 2016. Stacked hourglass networks for human pose estimation//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 483-499 ［DOI： 10.1007/978-3-319-46484-8_29http://dx.doi.org/10.1007/978-3-319-46484-8_29］

Ng T T and Chang S F. 2004. A Data Set of Authentic and Spliced Image Blocks. ADVENT Technical Report #203-2004-3. Columbia University

Novozamsky A， Mahdian B and Saic S. 2020. IMD2020： a large-scale annotated dataset tailored for detecting manipulated images//Proceedings of 2020 IEEE Winter Applications of Computer Vision Workshops. Snowmass， USA： IEEE： 71-80 ［DOI： 10.1109/WACVW50321.2020.9096940http://dx.doi.org/10.1109/WACVW50321.2020.9096940］

Olutomilayo K T， Bahramgiri M， Nooshabadi S， Oh J， Lakehal-Ayat M， Rogan D and Fuhrmann D R. 2021. Dataset for trailer angle estimation using radar point clouds. Data in Brief， 38： #107305 ［DOI： 10.1016/j.dib.2021.107305http://dx.doi.org/10.1016/j.dib.2021.107305］

Qi C R， Yi L， Su H and Guibas L J. 2017. PointNet++： deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 5105-5114 ［DOI： 10.5555/3295222.3295263http://dx.doi.org/10.5555/3295222.3295263］

Ren S Q， He K M， Girshick R and Sun J. 2017. Faster R-CNN： towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（6）： 1137-1149 ［DOI： 10.1109/TPAMI.2016.2577031http://dx.doi.org/10.1109/TPAMI.2016.2577031］

Ronneberger O， Fischer P and Brox T. 2015. U-Net： convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich， Germany： Springer： 234-241 ［DOI： 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28］

Salloum R， Ren Y Z and Kuo C C J. 2018. Image splicing localization using a multi-task fully convolutional network （MFCN）. Journal of Visual Communication and Image Representation， 51： 201-209 ［DOI： 10.1016/J.JVCIR.2018.01.010http://dx.doi.org/10.1016/J.JVCIR.2018.01.010］

Shelhamer E， Long J and Darrell T. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（4）： 640-651 ［DOI： 10.1109/TPAMI.2016.2572683http://dx.doi.org/10.1109/TPAMI.2016.2572683］

Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition ［EB/OL］. ［2022-11-10］. https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf

Wang J W， Wang H， Li J， Luo X Y， Shi Y Q and Jha S K. 2020. Detecting double JPEG compressed color images with the same quantization matrix in spherical coordinates. IEEE Transactions on Circuits and Systems for Video Technology， 30（8）： 2736-2749 ［DOI： 10.1109/TCSVT.2019.2922309http://dx.doi.org/10.1109/TCSVT.2019.2922309］

Woo S， Park J， Lee J Y and Kweon I S. 2018. CBAM： convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 3-19 ［DOI： 10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1］

Wu J W， Zhou W J， Luo T， Yu L and Lei J S. 2021. Multiscale multilevel context and multimodal fusion for RGB-D salient object detection. Signal Processing， 178： #107766 ［DOI： 10.1016/j.sigpro.2020.107766http://dx.doi.org/10.1016/j.sigpro.2020.107766］

Wu Y， AbdAlmageed W and Natarajan P. 2019. ManTra-Net： manipulation tracing network for detection and localization of image forgeries with anomalous features//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 9535-9544 ［DOI： 10.1109/CVPR.2019.00977http://dx.doi.org/10.1109/CVPR.2019.00977］

Zhang Y， Goh J， Win L L and Thing V. 2016. Image region forgery detection： a deep learning approach//Mathur A and Roychoudhury A， eds. Proceedings of the Singapore Cyber-Security Conference （SG-CRC）. Singapore： IOS Press Ebooks： 1-11 ［DOI： 10.3233/978-1-61499-617-0-1http://dx.doi.org/10.3233/978-1-61499-617-0-1］

Zhang Y L， Zhu G P， Wu L G， Kwong S， Zhang H L and Zhou Y C. 2022. Multi-task SE-network for image splicing localization. IEEE Transactions on Circuits and Systems for Video Technology， 32（7）： 4828-4840 ［DOI： 10.1109/TCSVT.2021.3123829http://dx.doi.org/10.1109/TCSVT.2021.3123829］

Zhou P， Han X T， Morariu V I and Davis L S. 2018. Learning rich features for image manipulation detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 1053-1061 ［DOI： 10.1109/CVPR.2018.00116http://dx.doi.org/10.1109/CVPR.2018.00116］

文章被引用时，请邮件提醒。

提交

多层次融合注意力网络的双目图像超分辨率重建

双Gabor滤波器手掌静脉识别网络

红外与可见光图像特征动态选择的目标检测网络

注意力引导局部特征联合学习的人脸表情识别

结合注意力机制和编码器—解码器架构的化学结构识别方法