融合多重机制的SAR舰船检测

肖振久; 林渤翰; 曲海成

doi:10.11834/jig.230166

遥感图像处理 | 浏览量 : 0 下载量: 7 CSCD: 0

PDF
导出
分享
收藏
专辑

融合多重机制的SAR舰船检测
SAR ship detection with multi-mechanism fusion
2024年29卷第2期页码：545-558
纸质出版日期： 2024-02-16 ，
DOI： 10.11834/jig.230166
稿件说明：

移动端阅览

肖振久，林渤翰，曲海成. 2024. 融合多重机制的SAR舰船检测. 中国图象图形学报， 29(02):0545-0558

Xiao Zhenjiu， Lin Bohan， Qu Haicheng. 2024. SAR ship detection with multi-mechanism fusion. Journal of Image and Graphics， 29(02):0545-0558
肖振久，林渤翰，曲海成. 2024. 融合多重机制的SAR舰船检测. 中国图象图形学报， 29(02):0545-0558 DOI： 10.11834/jig.230166.

Xiao Zhenjiu， Lin Bohan， Qu Haicheng. 2024. SAR ship detection with multi-mechanism fusion. Journal of Image and Graphics， 29(02):0545-0558 DOI： 10.11834/jig.230166.

摘要

目的

针对合成孔径雷达（synthetic aperture radar， SAR）图像噪声大、成像特征不明显，尤其在复杂场景更容易出现目标误检和漏检的问题，提出了一种融合多重机制的SAR舰船检测方法，用于提高SAR舰船检测的精度。

方法

在预处理部分，设计了U-Net Denoising模块，通过调整噪声方差参数

的范围来抑制相干斑噪声对图像的干扰。在YOLOv7（you only look once v7）主干网络构建MLAN_SC（maxpooling layer aggregation network that incorporate select kernel and contextual Transformer）结构，加入SK（selective kernel）通道注意力机制至下采样阶段，增强关键信息提取能力和特征表达能力。为解决MP（multiple pooling）结构中上下分支特征不平衡的问题，改善误检情况，融入上下文信息提取模块（contextual Transformer block， COT），利用卷积提取上下文信息，将局部信息和全局信息结合起来，使图像特征能够更有效地提取出来。在头部引入SPD卷积（space-to-depth convolution， SPD-Conv），增强小目标的检测能力。用WIoU（wise intersection over union）损失函数替换CIoU（complete intersection over union）损失函数，运用动态聚焦机制，在复杂图像上加强对目标的定位能力。

结果

在SSDD（SAR ship detection dataset）数据集和HRSID （high-resolution SAR images dataset）数据集上进行了实验对比，结果表明，改进后的方法相比于YOLOv7，AP（average precision）可达到99.25%和89.73%，分别提升了4.38%和2.57%，准确率和召回率为98.41%，93.24%和94.79%，81.83%，优于对比方法。

结论

本文通过融合多重机制改进YOLOv7方法，提升了对目标的定位能力，显著改善了SAR舰船检测中复杂舰船的误检和漏检情况，进一步提高了SAR舰船检测精度。

Abstract

Objective

In recent years， the efficacy of synthetic aperture radar （SAR） has been increasingly recognized in the fields of maritime surveillance and vessel detection due to its remarkable all-weather and day-to-night imaging capability. The ability of SAR systems to penetrate through clouds and fog has enabled high-quality imaging of the sea surface under various weather conditions. However， SAR imaging is frequently hindered by excessive noise and unclear imaging features， which can lead to erroneous detection in complex maritime environments. In response to this challenge， this study presents an innovative approach that combines state-of-the-art deep learning and computer vision techniques to improve the accuracy of SAR ship detection. By incorporating several critical enhancements into the YOLOv7 algorithm， the proposed method aims to enhance the capability of SAR systems to identify and track vessels accurately on the sea surface. The potential of this method is significant for maritime security and surveillance systems， because the accurate and reliable detection of vessels is paramount to ensuring the safety and security of shipping lanes and ports worldwide.

Method

The present study proposes a novel method that offers significant improvements to the YOLOv7 algorithm for SAR ship detection. In particular， a U-Net denoising module is designed in the preprocessing stage to suppress coherent speckle noise interference by leveraging deep learning techniques to model the range of parameter

. Moreover， the MLAN_SC structure is built in the YOLOv7 backbone network. To enhance key information extraction and deep feature expression abilities， the proposed method also introduces the selective kernel （SK） attention mechanism to improve the false detection rate. The contextual Transformer （COT） block is integrated into the backbone network to solve the problem of unbalanced upper and lower branch features in the multi-processings （MP） structure and improve the false detection situation. The COT block uses convolutional operations and combines local and global information for more effective feature extraction. In addition， space-to-depth convolution （SPD-Conv） is incorporated into the detection head to enhance small-object detection capability. This study further replaces the complete intersection over union loss function with the wise intersection over union （WIoU） loss function and applies a dynamic focusing mechanism to enhance target localization performance on complex images.

Result

We employed the network weights of ImageNet to train our model. The experimental data utilized in this study were selected from the SAR ship detection dataset （SSDD）. SSDD contains 1 160 SAR images and 2 456 ship targets. The dataset primarily includes data from the RadarSat-2， Sentinel-1， and TerraSAR-X sensors. The target area was cropped to 500 pixels in four polarization modes： horizontal （HH）， vertical （VV）， cross （HV）， and cross （VH）， and then labeled in PASCAL VOC format. Our deep learning framework was implemented in Python， and input image size was adjusted to 640 × 640 during training. The momentum parameter was set to 0.93， and the starting learning rate was set to 0.001. We employed the cosine annealing method to attenuate the learning rate. The NVIDIA GeForce RTX3060 GPU device was used to accelerate stochastic gradient descent learning and iterate the model. Multiple ablative experiments were conducted to validate the effectiveness of the proposed module improvements on the SSDD dataset by using the original YOLOv7 network as a baseline for comparison. The baseline algorithm achieved an accuracy of 94.87%， while the addition of the denoising module resulted in a more precise extraction of targets in complex backgrounds， leading to an improvement in accuracy. The incorporation of the SK attention mechanism to construct a feature capture sampling structure significantly affected SAR ship detection， enhancing the representation of deep-level features and the extraction of key information， reducing false positives， and further improving detection accuracy. The integration of the SPD-Conv module and WIoU loss function helped the model focus on targets in complex scenes， improving localization performance and enhancing the detection ability of small dense targets in deep sea. The proposed method achieved the best AP@0.5 （99.25%） and AP@0.5∶0.95 （71.21%） on the SSDD dataset， which were 4.38% and 9.19% higher than the YOLOv7 baseline， respectively， demonstrating the effectiveness of the proposed module improvements. Comparative experiments were conducted with YOLOv7 and other popular deep learning-based object detection algorithms， such as SSD， Faster R-CNN， RetinaNet， CenterNet， FENDet， in terms of accuracy， recall， average precision， and time. The results showed that the proposed method had a recall rate that was 16.18% higher than that of Faster R-CNN and an accuracy rate that was 14.45% higher than that of RetinaNet. Furthermore， the proposed method exhibited high performance in handling missed and false positives. The precision-recall （PR） curve comparison indicated that the proposed algorithm demonstrated excellent detection performance and a stable PR curve. The detection results of different algorithms on the SSDD dataset were presented. Although several methods effectively detected ship targets， the proposed method achieved the highest accuracy in ship detection and exhibited superior performance in handling missed and false positives. Overall， the proposed algorithm has high feasibility and practicality in SAR ship detection.

Conclusion

In this study， we propose an optimized version of the YOLOv7 algorithm for improving the accuracy of SAR ship detection. Our approach integrates multiple mechanisms to enhance information extraction and overcome challenges associated with noisy and complex images. In particular， we introduce a noise removal module， which effectively suppresses noise interference. The integration of the attention mechanism and self-attention mechanism strengthens feature extraction and enhances the discriminative learning ability of deep features. In addition， we incorporate SPD convolution and optimize the loss function to improve target location ability， resulting in significant improvements in false detection and missed detection rates for ships with complex backgrounds and dense small targets in near-shore SAR ship detection.

关键词

SAR图像舰船检测YOLOv7注意力机制上下文信息提取SPD卷积（SPD-Conv）WIoU损失函数

Keywords

SAR imageship detectionYOLOv7attention mechanismcontextual Transformerspace-to-depth convolution（SPD-Conv）WIoU loss function

references

Bochkovskiy A， Wang C Y and Liao H Y M. 2020. YOLOv4： optimal speed and accuracy of object detection ［EB/OL］. ［2023-03-20］. https://arxiv.org/pdf/2004.10934.pdfhttps://arxiv.org/pdf/2004.10934.pdf

Dalsasso E， Yang X L， Denis L， Tupin F and Yang W. 2020. SAR image despeckling by deep neural networks： from a pre-trained model to an end-to-end training strategy. Remote Sensing， 12（16）： #2636 ［DOI： 10.3390/rs12162636http://dx.doi.org/10.3390/rs12162636］

Duan K W， Bai S， Xie L X， Qi H G， Huang Q M and Tian Q. 2019. CenterNet： keypoint triplets for object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 6568-6577 ［DOI： 10.1109/ICCV.2019.00667http://dx.doi.org/10.1109/ICCV.2019.00667］

Gao Y L， Wu Z Y， Ren M and Wu C. 2022. Improved YOLOv4 based on attention mechanism for ship detection in SAR images. IEEE Access， 10： 23785-23797 ［DOI： 10.1109/ACCESS.2022.3154474http://dx.doi.org/10.1109/ACCESS.2022.3154474］

Gong S R， Xu S J， Zhou L F， Zhu J and Zhong S. 2022. Deformable atrous convolution nearshore SAR small ship detection incorporating mixed attention. Journal of Image and Graphics， 27（12）： 3663-3676

龚声蓉，徐少杰，周立凡，朱杰，钟珊. 2022. 融入混合注意力的可变形空洞卷积近岸SAR小舰船检测. 中国图象图形学报， 27（12）： 3663-3676 ［DOI： 10.11834/jig.210866http://dx.doi.org/10.11834/jig.210866］

Guo W， Shen L， Qu H C， Wang Y X and Lin C. 2022. Ship detection in SAR images based on adaptive weight pyramid and branch strong correlation. Journal of Image and Graphics， 27（10）： 3127-3138

郭伟，申磊，曲海成，王雅萱，林畅. 2022. 自适应权重金字塔和分支强相关的SAR图像舰船检测. 中国图象图形学报， 27（10）： 3127-3138 ［DOI： 10.11834/jig.210373http://dx.doi.org/10.11834/jig.210373］

Guo Y， Chen S Q， Zhan R H， Wang W and Zhang J. 2022. LMSD-YOLO： a lightweight YOLO algorithm for multi-scale SAR ship detection. Remote Sensing， 14（19）： #4801 ［DOI： 10.3390/rs14194801http://dx.doi.org/10.3390/rs14194801］

Hu J， Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 7132-7141 ［DOI： 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745］

Li X， Wang W H， Hu X L and Yang J. 2019. Selective kernel networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 510-519 ［DOI： 10.1109/CVPR.2019.00060http://dx.doi.org/10.1109/CVPR.2019.00060］

Li Y H， Yao T， Pan Y W and Mei T. 2023. Contextual Transformer networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence， 45（2）： 1489-1500 ［DOI： 10.1109/TPAMI.2022.3164083http://dx.doi.org/10.1109/TPAMI.2022.3164083］

Lin T Y， Goyal P， Girshick R， He K M and Dollar P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 2999-3007 ［DOI： 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324］

Liu W， Anguelov D， Erhan D， Szegedy C， Reed S， Fu C Y and Berg A C. 2016. SSD： single shot MultiBox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 21-37 ［DOI： 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2］

Liu Z， Lin Y T， Cao Y， Hu H， Wei Y X， Zhang Z， Lin S and Guo B N. 2021. Swin Transformer： hierarchical vision Transformer using shifted windows//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 9992-10002 ［DOI： 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986］

Redmon J， Divvala S， Girshick R and Farhadi A. 2016. You only look once： unified， real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 779-788 ［DOI： 10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91］

Redmon J and Farhadi A. 2017. YOLO9000： better， faster， stronger//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 6517-6525 ［DOI： 10.1109/CVPR.2017.690http://dx.doi.org/10.1109/CVPR.2017.690］

Redmon J and Farhadi A. 2018. YOLOv3： an incremental improvement ［EB/OL］. ［2023-03-20］. https://arxiv.org/pdf/1804.02767.pdfhttps://arxiv.org/pdf/1804.02767.pdf

Ren S Q， He K M， Girshick R and Sun J. 2017. Faster R-CNN： towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（6）： 1137-1149 ［DOI： 10.1109/TPAMI.2016.2577031http://dx.doi.org/10.1109/TPAMI.2016.2577031］

Ruan C， Guo H and An J B. 2021. SAR inshore ship detection algorithm in complex background. Journal of Image and Graphics， 26（5）： 1058-1066

阮晨，郭浩，安居白. 2021. 复杂背景下SAR近岸舰船检测. 中国图象图形学报， 26（5）： 1058-1066 ［DOI： 10.11834/jig.200266http://dx.doi.org/10.11834/jig.200266］

Sunkara R and Luo T. 2022. No More Strided Convolutions or pooling： a new CNN building block for low-resolution images and small objects ［EB/OL］. ［2023-03-20］. https://arxiv.org/pdf/2208.03641.pdfhttps://arxiv.org/pdf/2208.03641.pdf

Tan X D and Peng H. 2022. Improved YOLOv5 ship target detection in SAR image. Computer Engineering and Applications， 58（4）： 247-254

谭显东，彭辉. 2022. 改进YOLOv5的SAR图像舰船目标检测. 计算机工程与应用， 58（4）： 247-254 ［DOI： 10.3778/j.issn.1002-8331.2108-0308http://dx.doi.org/10.3778/j.issn.1002-8331.2108-0308］

Tong Z J， Chen Y H， Xu Z W and Yu R. 2023. Wise-IoU： bounding box regression loss with dynamic focusing mechanism ［EB/OL］. ［2023-03-20］. https://arxiv.org/pdf/2301.10051.pdfhttps://arxiv.org/pdf/2301.10051.pdf

Wang C Y， Bochkovskiy A and Liao H Y M. 2022. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors ［EB/OL］. ［2023-03-20］. https://arxiv.org/pdf/2207.02696.pdfhttps://arxiv.org/pdf/2207.02696.pdf

Wei S J， Zeng X F， Qu Q Z， Wang M， Su H and Shi J. 2020. HRSID： a high-resolution SAR images dataset for ship detection and instance segmentation. IEEE Access， 8： 120234-120254 ［DOI： 10.1109/ACCESS.2020.3005861http://dx.doi.org/10.1109/ACCESS.2020.3005861］

Yu J M， Wu T， Zhou S B， Pan H L， Zhang X and Zhang W. 2022. An SAR ship object detection algorithm based on feature information efficient representation network. Remote Sensing， 14（14）： #3489 ［DOI： 10.3390/rs14143489http://dx.doi.org/10.3390/rs14143489］

Zhang T W， Zhang X L， Li J W， Xu X W， Wang B Y， Zhan X， Xu Y Q， Ke X， Zeng T J， Su H， Ahmad I， Pan D C， Liu C， Zhou Y， Shi J and Wei S J. 2021. SAR ship detection dataset （SSDD）： official release and comprehensive data analysis. Remote Sensing， 13（18）： #3690 ［DOI： 10.3390/rs13183690http://dx.doi.org/10.3390/rs13183690］

文章被引用时，请邮件提醒。

提交

红外与可见光图像特征动态选择的目标检测网络

注意力引导局部特征联合学习的人脸表情识别

结合注意力机制和编码器—解码器架构的化学结构识别方法

基于多视图自适应3D骨架网络的工业装箱动作识别

阿尔茨海默症诊断与病理区域检测的反事实推理模型