增强小目标特征的航空遥感目标检测

赵文清; 孔子旭; 周震东; 赵振兵

doi:10.11834/jig.190612

遥感图像处理 | 浏览量 : 0 下载量: 0 CSCD: 10

PDF
导出
分享
收藏
专辑

增强小目标特征的航空遥感目标检测
Target detection algorithm of aerial remote sensing based on feature enhancement technology
2021年26卷第3期页码：644-653
纸质出版日期： 2021-03-16 ，

录用日期： 2020-06-15
DOI： 10.11834/jig.190612
稿件说明：

移动端阅览

赵文清, 孔子旭, 周震东, 赵振兵. 增强小目标特征的航空遥感目标检测[J]. 中国图象图形学报, 2021,26(3):644-653.

Wenqing Zhao, Zixu Kong, Zhendong Zhou, Zhenbing Zhao. Target detection algorithm of aerial remote sensing based on feature enhancement technology[J]. Journal of Image and Graphics, 2021,26(3):644-653.
赵文清, 孔子旭, 周震东, 赵振兵. 增强小目标特征的航空遥感目标检测[J]. 中国图象图形学报, 2021,26(3):644-653. DOI： 10.11834/jig.190612.

Wenqing Zhao, Zixu Kong, Zhendong Zhou, Zhenbing Zhao. Target detection algorithm of aerial remote sensing based on feature enhancement technology[J]. Journal of Image and Graphics, 2021,26(3):644-653. DOI： 10.11834/jig.190612.

摘要

目的

航空遥感图像中多为尺寸小、方向错乱和背景复杂的目标。传统的目标检测算法由于模型的特征提取网络对输入图像进行多次下采样，分辨率大幅降低，容易造成目标特征信息丢失，而且不同尺度的特征图未能有效融合，检测目标之间存在的相似特征不能有效关联，不仅时间复杂度高，而且提取的特征信息不足，导致目标漏检率和误检率偏高。为了提升算法对航空遥感图像目标的检测准确率，本文提出一种基于并行高分辨率结构结合长短期记忆网络（long short-term memory，LSTM）的目标检测算法。

方法

首先，构建并行高分辨率网络结构，由高分辨率子网络作为第1阶段，分辨率从高到低逐步增加子网络，将多个子网并行连接，构建子网时对不同分辨率的特征图反复融合，以增强目标特征表达；其次，对各个子网提取的特征图进行双线性插值上采样，并拼接通道特征；最后，使用双向LSTM整合通道特征信息，完成多尺度检测。

结果

将本文提出的检测算法在COCO（common objects in context）2017数据集、KITTI（Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago）车辆检测和UCAS-AOD（University of Chinese Academy of Sciences-Aerial Object Detection）航空遥感数据集上进行实验验证，平均检测准确率（mean average precision，mAP）分别为41.6%、69.4%和69.3%。在COCO 2017、KITTI和VCAS-AOD数据集上，本文算法与SSD513算法相比，平均检测准确率分别提升10.46%、7.3%、8.8%。

结论

本文方法有效提高了航空遥感图像中目标的平均检测准确率。

Abstract

Objective

Saliency in the detection of aerial remote sensing image can have many military and life applications. On the one hand

the spatial resolution of remote sensing image is becoming higher with the improvement of technology. On the other hand

it can be applied in urban traffic planning

military target tracking

ground object classification

and other aspects. Most of the advanced target detection algorithms (such as Fast region with convolutional neural network (R-CNN)

Mask R-CNN

and single shot multibox detector (SSD)) are tested on the general data set. However

the classifier based on the training of the general data set does not have a good detection effect on the aerial remote sensing image primarily due to the particularity of the aerial remote sensing image. An aerial remote sensing image is taken from a height of several hundred meters or even up to 10 000 m due to scale diversity. Thus

the sizes of similar objects in the remote sensing image differ. Taking the ship in the port as an example

the super large ship is nearly 400 meters long

and the small ship is tens of meters long. Aerial remote sensing images are shot from a high-altitude perspective

and the objects presented are all top views

which are quite different from the data set (horizontal perspective) generally used due to the particularity of perspective

which will lead to the poor effect of the trained target detection algorithm in practical application of remote sensing images. In the small target problem

most of the targets in the aerial remote sensing image are small (tens of pixels or even several pixels)

the amount of information of these targets in the image is very small

and the mainstream target detection algorithm is not ideal for the detection effect of small targets in these remote sensing images mainly because the detection method based on convolutional neural network uses the pooling layer

resulting in a lower original amount of information. For example

the target image of 24×24 pixels is transformed into 1×1 pixel after four pooling layers

and the dimension is very low to be classified. The background complexity is high because the aerial remote sensing image is taken from a high altitude

its field of vision is relatively large (usually covers several square kilometers)

and the image contains tens of thousands of backgrounds

resulting in the integration of the background and the small target

which has a strong interference on detection. Generally

the recognition rate of a small target in the remote sensing image is low

the scale is diverse

the direction is disordered

and the background is complex. On the one hand

edge information is lost when a small target is pooled. On the other hand

the semantic information of the feature map is not strong enough to detect the corresponding target. In this paper

a parallel high-resolution network structure combined with long short-term memory (LSTM) is proposed to replace the basic detection network visual geometry group 16-layer net (VGG16) of SSD and improve the detection accuracy of the algorithm for aerial targets.

Method

This paper introduces high-resolution network (HRNet) network and LSTM network in the SSD model. The largest feature of the HR-Net parallel network is that the input image can always maintain a high-resolution output. This parallel network structure and traditional top-down extraction feature are then up sampled and restored. The feature size is different. The parallel structure effectively reduces the number of down sampling and the loss of feature information of the target edge to be detected. The LSTM network is a variant of the circulatory neural network. The R-CNN cannot be deeply trained due to the disappearance of the gradient. The LSTM network combines short-term memory with long-term memory through subtle door control

which solves the gradient disappearance to a certain extent. To address the problem of gradient explosion

first

the method of parallel high-resolution feature map in HRNet is used to build the residual module. The first stage is the high-resolution subnetwork

which gradually increases the high-resolution subnetwork to the low-resolution subnetwork

and the multistage subnetwork is connected in parallel. Second

repeated feature fusion is carried out to obtain rich feature information. Finally

the feature map of each subnet is sampled and fused

the channel information is integrated with bidirectional LSTM

and context information is effectively used to form a multiscale detection.

Result

By applying the improved network to SSD algorithm

this paper compares it with the SSD method on common objects in context (COCO) 2017 dataset

Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago (KITTI)

and University of Chinese Academy of Sciences-Aerial Object Detection (UCAS-AOD) of aviation target dataset. In the COCO2017 dataset

the model mean average precision is 41.6%

which is 10.4% higher than that of SSD513 + ResNet101. In the KITTI and UCAS-AOD datasets

the mean average precision (mAP) of this model is 69.4% and 69.3%

respectively. On COCO2017 dataset

KITTI dataset and UCAS-AOD dataset

the average detection accuracy of this algorithm increased by 10.4%

7.3% and 8.8% compared with SSD513.

Conclusion

Results show that this method can reduce the miss detection rate of a small target and improve the average detection accuracy of the entire target.

关键词

航空遥感图像机器视觉小目标检测并行高分辨率网络长短期记忆网络COCO数据集UCAS-AOD数据集

Keywords

aerial remote sensing imagemachine visionsmall target detectionparallel high resolution networklong short-term memory (LSTM)COCO datasetUCAS-AOD dataset

references

Fu C Y, Liu W, Ranga A, Tyagi A and Berg A C. 2017. DSSD: Deconvolutional single shot detector[EB/OL].[2019-04-09].https://arxiv.org/org/pdf/1701.06659.pdfhttps://arxiv.org/org/pdf/1701.06659.pdf

He K M, Gkioxari G, Dollar P and Girshick R. 2017. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2): 386-397[DOI:10.1109/TPAMI.2018.2844175]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778.

He K M, Zhang X Y, Ren S Q and Sun J. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1904-1916[DOI:10.1109/TPAMI.2015.2389824]

Li Y, Liu X Y, Zhang H Q, Li X J and Sun X Y. 2018. Optical remote sensing image retrieval based on convolutional neural networks. Optics and Precision Engineering, 26(1): 200-207

李宇, 刘雪莹, 张洪群, 李湘眷, 孙晓瑶. 2018. 基于卷积神经网络的光学遥感图像检索. 光学精密工程, 26(1): 200-207[DOI:10.3788/OPE.20182601.0200]

Liang H, Song Y L, Qian F and Song C. 2018. Detection of small target in aerial photography based on deep learning. Chinese Journal of Liquid Crystals and Displays, 33(9): 793-800

梁华, 宋玉龙, 钱锋, 宋策. 2018. 基于深度学习的航空对地小目标检测. 液晶与显示, 33(9): 793-800[DOI:10.3788/YJYXS20183309.0793]

Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2020. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2): 318-327[DOI:10.1109/TPAMI.2018.2858826]

Liu F, Shen T S, Ma X X and Zhang J. 2017. Ship recognition based on multi-band deep neural network. Optics and Precision Engineering, 25(11): 2939-2946

刘峰, 沈同圣, 马新星, 张健. 2017. 基于多波段深度神经网络的舰船目标识别. 光学精密工程, 25(11): 2939-2946[DOI:10.3788/OPE.20172511.2939]

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot multibox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 21-37[DOI: 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2]

Long M and Tong Y Y. 2018. Research on face liveness detection algorithm using convolutional neural network. Journal of Frontiers of Computer Science and Technology, 12(10): 1658-1670

龙敏, 佟越洋. 2018. 应用卷积神经网络的人脸活体检测算法研究. 计算机科学与探索, 12(10): 1658-1670[DOI:10.3778/j.issn.1673-9418.1801009]

Lu X, Li B Y, Yue Y X, Li Q Q and Yan J J. 2019. Grid R-CNN//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Long Beach, USA: IEEE: 7363-7372

Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 779-788[DOI: 10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91]

Ren S Q, He K M, Girshick R and Sun J. 2016. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149[DOI:10.1109/TPAMI.2016.2577031]

Sun K, Xiao B, Liu D and Wang J D. 2019. Deep high-resolution representation learning for human pose estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5686-5696[DOI: 10.1109/CVPR.2019.00584http://dx.doi.org/10.1109/CVPR.2019.00584]

Wang W H, Gao L, Wu S B and Zhao Y N. 2019. Review of pedestrian detection. Motorcycle Technology, (1): 29-32

王文豪, 高利, 吴绍斌, 赵亚男. 2019. 行人检测综述. 摩托车技术, (1): 29-32[DOI:10.3969/j.issn.1001-7666.2019.01.003]

Wu T S, Zhang Z J, Liu Y P, Pei W H and Chen H Y. 2018. A lightweight small object detection algorithm based on improved SSD. Infrared and Laser Engineering, 47(7): 37-43

吴天舒, 张志佳, 刘云鹏, 裴文慧, 陈红叶. 2018. 基于改进SSD的轻量化小目标检测算法. 外与激光工程, 47(7): 37-43[DOI:10.3788/IRLA201847.0703005]

Yao Q L, Hu X and Lei H. 2018. Application of deep convolutional neural network in object detection. Computer Engineering and Applications, 54(17): 1-9

姚群力, 胡显, 雷宏. 2018. 深度卷积神经网络在目标检测中的研究进展. 计算机工程与应用, 54(17): 1-9[DOI:10.3778/j.issn.1002-8331.1806-0377]

Zhao W Q, Zhou Z D and Zhai Y J. 2019. SSD small target detection algorithm based on deconvolution and feature fusion. CAAI Transactions on Intelligent Systems, 15(2): 310-316

赵文清, 周震东, 翟永杰. 2019. 基于反卷积和特征融合的SSD小目标检测算法. 智能系统学报, 15(2): 310-316[DOI:10.11992/tis.201905035]

文章被引用时，请邮件提醒。

提交

基于视觉的液晶屏/OLED屏缺陷检测方法综述

融合边缘与灰度特征的形变工件精准定位方法

无人机航拍图像中电力线检测方法研究进展

融合自注意力机制的生成对抗网络跨视角步态识别

道路结构特征下的车道线智能检测