结合金字塔Transformer与浅层CNN的变电站图像篡改检测

邢建好; 田秀霞; 韩奕

doi:10.11834/jig.230202

数字媒体深度伪造与对抗 | 浏览量 : 0 下载量: 7 CSCD: 0

PDF
导出
分享
收藏
专辑

结合金字塔Transformer与浅层CNN的变电站图像篡改检测
Pyramid Transformer combined with shallow CNN for substation image tampering detection
2024年29卷第2期页码：444-456
纸质出版日期： 2024-02-16 ，
DOI： 10.11834/jig.230202
稿件说明：

移动端阅览

邢建好，田秀霞，韩奕. 2024. 结合金字塔Transformer与浅层CNN的变电站图像篡改检测. 中国图象图形学报， 29(02):0444-0456

Xing Jianhao， Tian Xiuxia， Han Yi. 2024. Pyramid Transformer combined with shallow CNN for substation image tampering detection. Journal of Image and Graphics， 29(02):0444-0456
邢建好，田秀霞，韩奕. 2024. 结合金字塔Transformer与浅层CNN的变电站图像篡改检测. 中国图象图形学报， 29(02):0444-0456 DOI： 10.11834/jig.230202.

Xing Jianhao， Tian Xiuxia， Han Yi. 2024. Pyramid Transformer combined with shallow CNN for substation image tampering detection. Journal of Image and Graphics， 29(02):0444-0456 DOI： 10.11834/jig.230202.

摘要

目的

变电站图像拼接篡改是电力系统的一大安全隐患，针对篡改图像背景复杂、篡改内容尺度不一造成的误检漏检问题以及相关研究较少，本文提出一种面向变电站的拼接篡改图像的双通道检测模型。

方法

两通道均采用深度学习方法自适应提取篡改图像和残差图像的特征，其中篡改图像包含丰富的色彩特征和内容信息，残差图像重点凸显了篡改区域的边缘，有效应对了篡改图像多样性导致的篡改特征提取困难问题；将特征金字塔结构Transformer通道作为网络主分支，通过全局交互机制获取图像全局信息，建立关键点之间的联系，使模型具备良好的泛化性和多尺度特征处理能力；引入浅层卷积神经网络（convolutional neural network， CNN）通道作为辅助分支，着重提取篡改区域的边缘特征，使模型在整体轮廓上更容易定位篡改区域。

结果

实验在自制变电站拼接篡改数据集（self-made substation splicing tampered dataset， SSSTD）、CASIA（Chinese Academy of Sciences Institute of Automation dataset）和NIST16（National Institute of Standards and Technology 16）上与4种同类型方法进行比较。定量上看，在SSSTD数据集中，本文模型相对性能第2的模型在精确率、召回率、F1和平均精度上分别提高了0.12%、2.17%、1.24%和7.71%；在CASIA和NIST16数据集中，本文模型也取得了最好成绩。定性上看，所提模型减少了误检和漏检，同时定位精度更高。

结论

本文提出的双通道拼接篡改检测模型结合了Transformer和CNN在图像篡改检测方面的优势，提高了模型的检测精度，适用于复杂变电站场景下的篡改目标检测。

Abstract

Objective

Image information becomes particularly important with the widespread application of intelligent power inspection. However， the rapid development of image tampering technology provides unscrupulous elements with a new way to harm power systems. As an important component of power systems， substations are responsible for the interconversion of different voltage levels. Ensuring the full-time output of stable voltage and the reasonable use of substation resources is the basis for the safe and stable operation of an entire power network. However， if the collected substation images are maliciously tampered with， then this condition may not only cause the failure of a smart grid system but also make operators misjudge the actual situation of the substation， eventually leading to power system failure and may even cause major accidents， such as large-scale power outages， resulting in irreversible losses to national production. Therefore， detecting tampered images of substations is a key task in ensuring the stability of power systems. The complex background of tampered images and the different scales of tampered contents cause existing detection models to experience the problems of false detection and leakage detection. Meanwhile， related research on image splicing tampering detection in power scenes is lacking. Accordingly， this study proposes a dual-channel detection model for splicing tampered images in substation scenes.

Method

The model consists of three parts： a Transformer channel with a feature pyramid structure， a shallow convolutional neural network （CNN） channel， and a network head. The size of the input tampered image is 512 ×512 × 3， and the output is the detection and localization results of the tampered image. Both channels use deep learning methods to extract features of the original color image and the residual image adaptively. The original color image contains rich color features and content information， while the residual image focuses on highlighting the edges of the tampered region， effectively solving the problem of difficult extraction of tampered features caused by the diversity of tampered images. In this study， the feature pyramid structure Transformer channel is used as the primary feature extraction channel， which consists of the pyramid structure Transformer and a progressive local decoder （PLD）. The Transformer can efficiently extract features and establish connections between feature points via global attention from the first layer of the model in the global sensory field. Meanwhile， the use of the pyramid structure provides the network with better generalization and multi-scale feature processing capability. PLD enables features with different depths and expressiveness to guide and fuse with one another， solving the problems of attention scattering and the underestimation of local features to improve detail processing capability. The shallow CNN channel is used as an auxiliary detection channel， while the shallow network is used to extract the edge features of the tampered region in the residual image， enabling the model to locate the tampered region more easily in the overall contour. The residual block is the residual network module that forms the backbone of the shallow network. Its input is the residual image generated from the tampered image through the high-pass filtering layer. The parallel axial attention block introduces different sizes of dilated convolution to increase the perceptual field of the shallow network， and the parallel axial attention mechanism helps the network extract contextual semantic information. The features of two tributaries are fused into the network head by the channel， and the experiments conducted in this study show that merging by the channel is more effective than accumulation by elements. Finally， the network head detects the presence or absence of tampered regions in the image and accurately locates them.

Result

The experiments are first conducted on the pretraining datasets and pretraining weights are obtained. The test results show that the model in this study exhibits good detection effect on various tampering targets. The model is fine-tuned on the basis of the pretraining weights and compared with four models of the same type on the self-made substation splicing tampered dataset （SSSTD）， CASIA， and NIST16. Four evaluation metrics， namely， accuracy， recall， F1， and average accuracy， are selected for quantitative analysis. In SSSTD， the accuracy， recall， F1， and average precision indexes of this study’s model improved by 0.12%， 2.17%， 1.24%， and 7.71%， respectively， compared with the model with the 2nd highest performance. In CASIA， this study’s model still achieves the best results in the four evaluation indexes. In NIST16， various detection models achieve higher values in accuracy， and this study’s model achieves higher values in recall rate. F1 and average precision indexes are substantially improved compared with the four comparison models. Qualitatively， the proposed model mitigates the problems of false detection and missed detection， while achieving higher localization accuracy. The overall detection effect is better than the other models.

Conclusion

The detection of tampered substation image splicing is a key task in ensuring the stability of a power system. This study designs a new complex substation image splicing tampering detection model based on a feature pyramid structure Transformer and a shallow CNN dual channels. The feature pyramid structure Transformer channel obtains rich semantic information and visual features of tampered images through the global interaction mechanism， enhancing the accuracy and multi-scale processing capability of the detection model. As an auxiliary channel， the shallow CNN focuses on extracting residual image edge features， making it easier for the model to locate tampered regions in the overall contour. The models are measured on different splicing tampering datasets， and all the models in this study achieve optimal results. The visualization further shows that the model in this study exhibits the best detection effect in the actual substation scenario. However， this work only investigates image splicing tampering detection， while diverse types of tampering occur in reality. The next step is to investigate other types of tampered image detection to improve the compatibility of tampering detection models.

关键词

变电站图像拼接篡改检测Transformer卷积神经网络（CNN）双通道网络特征金字塔结构浅层网络

Keywords

substation imagesplicing tampering detectionTransformerconvolutional neural network（CNN）dual-channel networkfeature pyramid structureshallow network

references

Chen G， Zhang S Q and Zhao X M. 2022. Video sequence-based human facial expression recognition using Transformer networks. Journal of Image and Graphics， 27（10）： 3022-3030

陈港，张石清，赵小明. 2022. 采用Transformer网络的视频序列表情识别. 中国图象图形学报， 27（10）： 3022-3030 ［DOI： 10.11834/jig.210248http://dx.doi.org/10.11834/jig.210248］

Chen L C， Zhu Y K， Papandreou G， Schroff F and Adam H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 833-851 ［DOI： 10.1007/978-3-030-01234-2_49http://dx.doi.org/10.1007/978-3-030-01234-2_49］

Ding H W， Chen L Y， Tao Q， Fu Z W， Dong L and Cui X H. 2023. DCU-Net： a dual-channel U-shaped network for image splicing forgery detection. Neural Computing and Applications， 35（7）： 5015-5031 ［DOI： 10.1007/s00521-021-06329-4http://dx.doi.org/10.1007/s00521-021-06329-4］

Dong J， Wang W and Tan T N. 2013. CASIA image tampering detection evaluation database//Proceedings of 2013 IEEE China Summit and International Conference on Signal and Information Processing. Beijing， China： IEEE： 422-426 ［DOI： 10.1109/ChinaSIP.2013.6625374http://dx.doi.org/10.1109/ChinaSIP.2013.6625374］

Dosovitskiy A， Beyer L， Kolesnikov A， Weissenborn D， Zhai X H， Unterthiner T， Dehghani M， Minderer M， Heigold G， Gelly S， Uszkoreit J and Houlsby N. 2021. An image is worth 16 × 16 words： Transformers for image recognition at scale//Proceedings of the 9th International Conference on Learning Representations. ［s.l.］： OpenReview.net

Everingham M， Van Gool L， Williams C K I， Winn J and Zisserman A. 2010. The pascal visual object classes （VOC） challenge. International Journal of Computer Vision， 88（2）： 303-338 ［DOI： 10.1007/s11263-009-0275-4http://dx.doi.org/10.1007/s11263-009-0275-4］

Fan W， Kai W and Cayre F. 2015. General-purpose image forensics using patch likelihood under image statistical models//Proceedings of 2015 IEEE International Workshop on Information Forensics and Security. Rome， Italy： IEEE： 1-6 ［DOI： 10.1109/WIFS.2015.7368606http://dx.doi.org/10.1109/WIFS.2015.7368606］

Fridrich J and Kodovsky J. 2012. Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security， 7（3）： 868-882 ［DOI： 10.1109/tifs.2012.2190402http://dx.doi.org/10.1109/tifs.2012.2190402］

Gao S H， Cheng M M， Zhao K， Zhang X Y， Yang M H and Torr P. 2021. Res2Net： a new multi-scale backbone architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence， 43（2）： 652-662 ［DOI： 10.1109/TPAMI.2019.2938758http://dx.doi.org/10.1109/TPAMI.2019.2938758］

Guan H Y， Kozak M， Robertson E， Lee Y， Yates A N， Delgado A， Zhou D， Kheyrkhah T， Smith J and Fiscus J. 2019. MFC datasets： large-scale benchmark datasets for media forensic challenge evaluation//Proceedings of 2019 IEEE Winter Applications of Computer Vision Workshops. Waikoloa， USA： IEEE： 63-72 ［DOI： 10.1109/WACVW.2019.00018http://dx.doi.org/10.1109/WACVW.2019.00018］

Guo J H. 2021. Research and Development of 220 kV Substation Remote Inspection System Based on Intelligent Image Recognition. Guangzhou， China： South China University of Technology

郭嘉华. 2021. 基于智能图像识别的220 kV变电站远程巡检系统研发. 广州：华南理工大学

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Kim T， Lee H and Kim D. 2021. UACANet： uncertainty augmented context attention for polyp segmentation//Proceedings of the 29th ACM International Conference on Multimedia. ［s.l.］： ACM： 2167-2175 ［DOI： 10.1145/3474085.3475375http://dx.doi.org/10.1145/3474085.3475375］

Li H D， Chen X M， Zhuang P Y and Li B. 2021. Image tampering localization using unified two-stream features enhanced with channel and spatial attention//Proceedings of the 4th Chinese Conference on Pattern Recognition and Computer Vision. Beijing， China： Springer： 610-622 ［DOI： 10.1007/978-3-030-88007-1_50http://dx.doi.org/10.1007/978-3-030-88007-1_50］

Li Y， Bian S， Wang C T and Lu W. 2023. CNN and Transformer-coordinated deepfake detection. Journal of Image and Graphics， 28（3）： 804-819

李颖，边山，王春桃，卢伟. 2023. CNN结合Transformer的深度伪造高效检测. 中国图象图形学报， 28（3）： 804-819 ［DOI： 10.11834/jig.220519http://dx.doi.org/10.11834/jig.220519］

Liu Z， Tian X X and Bai W R. 2022. Dual-channel image splicing forgery detection model of electric power site. Application Research of Computers， 39（4）： 1218-1223

刘正，田秀霞，白万荣. 2022. 面向电力场景的双通道图像拼接窜改检测模型. 计算机应用研究， 39（4）： 1218-1223 ［DOI： 10.19734/j.issn.1001-3695.2021.08.0333http://dx.doi.org/10.19734/j.issn.1001-3695.2021.08.0333］

Lyu S W， Pan X Y and Zhang X. 2014. Exposing region splicing forgeries with blind local noise estimation. International Journal of Computer Vision， 110（2）： 202-221 ［DOI： 10.1007/s11263-013-0688-yhttp://dx.doi.org/10.1007/s11263-013-0688-y］

Niu Y K， Tondi B， Zhao Y， Ni R R and Barni M. 2021. Image splicing detection， localization and attribution via JPEG primary quantization matrix estimation and clustering. IEEE Transactions on Information Forensics and Security， 16： 5397-5412 ［DOI： 10.1109/TIFS.2021.3129654http://dx.doi.org/10.1109/TIFS.2021.3129654］

Ronneberger O， Fischer P and Brox T. 2015. U-Net： convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich， Germany： Springer： 234-241 ［DOI： 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28］

Tian X X， Li H Q， Zhang Q and Zhou A Y. 2021. Dual-channel R-FCN model for image forgery detection. Chinese Journal of Computers， 44（2）： 370-383

田秀霞，李华强，张琴，周傲英. 2021. 基于双通道R-FCN的图像篡改检测模型. 计算机学报， 44（2）： 370-383 ［DOI： 10.11897/SP.J.1016.2021.00370http://dx.doi.org/10.11897/SP.J.1016.2021.00370］

Vaswani A， Shazeer N， Parmar N， Uszkoreit J， Jones L， Gomez A N， Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 6000-6010

Wang Z X， Xu Z M， Xue Y Y， Lang C Y， Li Z and Wei L L. 2023. Global and spatial multi-scale contexts fusion for vehicle re-identification. Journal of Image and Graphics， 28（2）： 471-482

王振学，许喆铭，雪洋洋，郎丛妍，李尊，魏莉莉. 2023. 融合全局与空间多尺度上下文信息的车辆重识别. 中国图象图形学报， 28（2）： 471-482 ［DOI： 10.11834/jig.210849http://dx.doi.org/10.11834/jig.210849］

Wu X， Liu X and Zhao J W. 2022. A lightweight multiscale fusion algorithm for image tampering detection. Computer Engineering， 48（2）： 224-229， 236

吴旭，刘翔，赵静文. 2022. 一种轻量级多尺度融合的图像篡改检测算法. 计算机工程， 48（2）： 224-229， 236 ［DOI： 10.19678/j.issn.1000-3428.0060066http://dx.doi.org/10.19678/j.issn.1000-3428.0060066］

Xie E Z， Wang W H， Yu Z D， Anandkumar A， Álvarez J M and Luo P. 2021. SegFormer： simple and efficient design for semantic segmentation with Transformers//Proceedings of the 35th International Conference on Neural Information Processing Systems. ［s.l.］：［s.n.］： 12077-12090

Zhou P， Han X T， Morariu V I and Davis L S. 2018. Learning rich features for image manipulation detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE ［DOI： 10.1109/CVPR.2018.00116http://dx.doi.org/10.1109/CVPR.2018.00116］

Zhou X Y， Koltun V and Krähenbühl P. 2020. Tracking objects as points//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 474-490 ［DOI： 10.1007/978-3-030-58548-8_28http://dx.doi.org/10.1007/978-3-030-58548-8_28］

Zhuang P Y， Li H D， Tan S Q， Li B and Huang J W. 2021. Image tampering localization using a dense fully convolutional network. IEEE Transactions on Information Forensics and Security， 16： 2986-2999 ［DOI： 10.1109/TIFS.2021.3070444http://dx.doi.org/10.1109/TIFS.2021.3070444］

文章被引用时，请邮件提醒。

提交

面向弱纹理目标立体匹配的Transformer网络