增强小目标特征的航空遥感目标检测
Target detection algorithm of aerial remote sensing based on feature enhancement technology
- 2021年26卷第3期 页码:644-653
纸质出版日期: 2021-03-16 ,
录用日期: 2020-06-15
DOI: 10.11834/jig.190612
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2021-03-16 ,
录用日期: 2020-06-15
移动端阅览
赵文清, 孔子旭, 周震东, 赵振兵. 增强小目标特征的航空遥感目标检测[J]. 中国图象图形学报, 2021,26(3):644-653.
Wenqing Zhao, Zixu Kong, Zhendong Zhou, Zhenbing Zhao. Target detection algorithm of aerial remote sensing based on feature enhancement technology[J]. Journal of Image and Graphics, 2021,26(3):644-653.
目的
2
航空遥感图像中多为尺寸小、方向错乱和背景复杂的目标。传统的目标检测算法由于模型的特征提取网络对输入图像进行多次下采样,分辨率大幅降低,容易造成目标特征信息丢失,而且不同尺度的特征图未能有效融合,检测目标之间存在的相似特征不能有效关联,不仅时间复杂度高,而且提取的特征信息不足,导致目标漏检率和误检率偏高。为了提升算法对航空遥感图像目标的检测准确率,本文提出一种基于并行高分辨率结构结合长短期记忆网络(long short-term memory,LSTM)的目标检测算法。
方法
2
首先,构建并行高分辨率网络结构,由高分辨率子网络作为第1阶段,分辨率从高到低逐步增加子网络,将多个子网并行连接,构建子网时对不同分辨率的特征图反复融合,以增强目标特征表达;其次,对各个子网提取的特征图进行双线性插值上采样,并拼接通道特征;最后,使用双向LSTM整合通道特征信息,完成多尺度检测。
结果
2
将本文提出的检测算法在COCO(common objects in context)2017数据集、KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago)车辆检测和UCAS-AOD(University of Chinese Academy of Sciences-Aerial Object Detection)航空遥感数据集上进行实验验证,平均检测准确率(mean average precision,mAP)分别为41.6%、69.4%和69.3%。在COCO 2017、KITTI和VCAS-AOD数据集上,本文算法与SSD513算法相比,平均检测准确率分别提升10.46%、7.3%、8.8%。
结论
2
本文方法有效提高了航空遥感图像中目标的平均检测准确率。
Objective
2
Saliency in the detection of aerial remote sensing image can have many military and life applications. On the one hand
the spatial resolution of remote sensing image is becoming higher with the improvement of technology. On the other hand
it can be applied in urban traffic planning
military target tracking
ground object classification
and other aspects. Most of the advanced target detection algorithms (such as Fast region with convolutional neural network (R-CNN)
Mask R-CNN
and single shot multibox detector (SSD)) are tested on the general data set. However
the classifier based on the training of the general data set does not have a good detection effect on the aerial remote sensing image primarily due to the particularity of the aerial remote sensing image. An aerial remote sensing image is taken from a height of several hundred meters or even up to 10 000 m due to scale diversity. Thus
the sizes of similar objects in the remote sensing image differ. Taking the ship in the port as an example
the super large ship is nearly 400 meters long
and the small ship is tens of meters long. Aerial remote sensing images are shot from a high-altitude perspective
and the objects presented are all top views
which are quite different from the data set (horizontal perspective) generally used due to the particularity of perspective
which will lead to the poor effect of the trained target detection algorithm in practical application of remote sensing images. In the small target problem
most of the targets in the aerial remote sensing image are small (tens of pixels or even several pixels)
the amount of information of these targets in the image is very small
and the mainstream target detection algorithm is not ideal for the detection effect of small targets in these remote sensing images mainly because the detection method based on convolutional neural network uses the pooling layer
resulting in a lower original amount of information. For example
the target image of 24×24 pixels is transformed into 1×1 pixel after four pooling layers
and the dimension is very low to be classified. The background complexity is high because the aerial remote sensing image is taken from a high altitude
its field of vision is relatively large (usually covers several square kilometers)
and the image contains tens of thousands of backgrounds
resulting in the integration of the background and the small target
which has a strong interference on detection. Generally
the recognition rate of a small target in the remote sensing image is low
the scale is diverse
the direction is disordered
and the background is complex. On the one hand
edge information is lost when a small target is pooled. On the other hand
the semantic information of the feature map is not strong enough to detect the corresponding target. In this paper
a parallel high-resolution network structure combined with long short-term memory (LSTM) is proposed to replace the basic detection network visual geometry group 16-layer net (VGG16) of SSD and improve the detection accuracy of the algorithm for aerial targets.
Method
2
This paper introduces high-resolution network (HRNet) network and LSTM network in the SSD model. The largest feature of the HR-Net parallel network is that the input image can always maintain a high-resolution output. This parallel network structure and traditional top-down extraction feature are then up sampled and restored. The feature size is different. The parallel structure effectively reduces the number of down sampling and the loss of feature information of the target edge to be detected. The LSTM network is a variant of the circulatory neural network. The R-CNN cannot be deeply trained due to the disappearance of the gradient. The LSTM network combines short-term memory with long-term memory through subtle door control
which solves the gradient disappearance to a certain extent. To address the problem of gradient explosion
first
the method of parallel high-resolution feature map in HRNet is used to build the residual module. The first stage is the high-resolution subnetwork
which gradually increases the high-resolution subnetwork to the low-resolution subnetwork
and the multistage subnetwork is connected in parallel. Second
repeated feature fusion is carried out to obtain rich feature information. Finally
the feature map of each subnet is sampled and fused
the channel information is integrated with bidirectional LSTM
and context information is effectively used to form a multiscale detection.
Result
2
By applying the improved network to SSD algorithm
this paper compares it with the SSD method on common objects in context (COCO) 2017 dataset
Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago (KITTI)
and University of Chinese Academy of Sciences-Aerial Object Detection (UCAS-AOD) of aviation target dataset. In the COCO2017 dataset
the model mean average precision is 41.6%
which is 10.4% higher than that of SSD513 + ResNet101. In the KITTI and UCAS-AOD datasets
the mean average precision (mAP) of this model is 69.4% and 69.3%
respectively. On COCO2017 dataset
KITTI dataset and UCAS-AOD dataset
the average detection accuracy of this algorithm increased by 10.4%
7.3% and 8.8% compared with SSD513.
Conclusion
2
Results show that this method can reduce the miss detection rate of a small target and improve the average detection accuracy of the entire target.
航空遥感图像机器视觉小目标检测并行高分辨率网络长短期记忆网络COCO数据集UCAS-AOD数据集
aerial remote sensing imagemachine visionsmall target detectionparallel high resolution networklong short-term memory (LSTM)COCO datasetUCAS-AOD dataset
Fu C Y, Liu W, Ranga A, Tyagi A and Berg A C. 2017. DSSD: Deconvolutional single shot detector[EB/OL].[2019-04-09].https://arxiv.org/org/pdf/1701.06659.pdfhttps://arxiv.org/org/pdf/1701.06659.pdf
He K M, Gkioxari G, Dollar P and Girshick R. 2017. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2): 386-397[DOI:10.1109/TPAMI.2018.2844175]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778.
He K M, Zhang X Y, Ren S Q and Sun J. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1904-1916[DOI:10.1109/TPAMI.2015.2389824]
Li Y, Liu X Y, Zhang H Q, Li X J and Sun X Y. 2018. Optical remote sensing image retrieval based on convolutional neural networks. Optics and Precision Engineering, 26(1): 200-207
李宇, 刘雪莹, 张洪群, 李湘眷, 孙晓瑶. 2018. 基于卷积神经网络的光学遥感图像检索. 光学精密工程, 26(1): 200-207[DOI:10.3788/OPE.20182601.0200]
Liang H, Song Y L, Qian F and Song C. 2018. Detection of small target in aerial photography based on deep learning. Chinese Journal of Liquid Crystals and Displays, 33(9): 793-800
梁华, 宋玉龙, 钱锋, 宋策. 2018. 基于深度学习的航空对地小目标检测. 液晶与显示, 33(9): 793-800[DOI:10.3788/YJYXS20183309.0793]
Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2020. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2): 318-327[DOI:10.1109/TPAMI.2018.2858826]
Liu F, Shen T S, Ma X X and Zhang J. 2017. Ship recognition based on multi-band deep neural network. Optics and Precision Engineering, 25(11): 2939-2946
刘峰, 沈同圣, 马新星, 张健. 2017. 基于多波段深度神经网络的舰船目标识别. 光学 精密工程, 25(11): 2939-2946[DOI:10.3788/OPE.20172511.2939]
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot multibox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 21-37[DOI: 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2]
Long M and Tong Y Y. 2018. Research on face liveness detection algorithm using convolutional neural network. Journal of Frontiers of Computer Science and Technology, 12(10): 1658-1670
龙敏, 佟越洋. 2018. 应用卷积神经网络的人脸活体检测算法研究. 计算机科学与探索, 12(10): 1658-1670[DOI:10.3778/j.issn.1673-9418.1801009]
Lu X, Li B Y, Yue Y X, Li Q Q and Yan J J. 2019. Grid R-CNN//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Long Beach, USA: IEEE: 7363-7372
Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 779-788[DOI: 10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91]
Ren S Q, He K M, Girshick R and Sun J. 2016. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149[DOI:10.1109/TPAMI.2016.2577031]
Sun K, Xiao B, Liu D and Wang J D. 2019. Deep high-resolution representation learning for human pose estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5686-5696[DOI: 10.1109/CVPR.2019.00584http://dx.doi.org/10.1109/CVPR.2019.00584]
Wang W H, Gao L, Wu S B and Zhao Y N. 2019. Review of pedestrian detection. Motorcycle Technology, (1): 29-32
王文豪, 高利, 吴绍斌, 赵亚男. 2019. 行人检测综述. 摩托车技术, (1): 29-32[DOI:10.3969/j.issn.1001-7666.2019.01.003]
Wu T S, Zhang Z J, Liu Y P, Pei W H and Chen H Y. 2018. A lightweight small object detection algorithm based on improved SSD. Infrared and Laser Engineering, 47(7): 37-43
吴天舒, 张志佳, 刘云鹏, 裴文慧, 陈红叶. 2018. 基于改进SSD的轻量化小目标检测算法. 外与激光工程, 47(7): 37-43[DOI:10.3788/IRLA201847.0703005]
Yao Q L, Hu X and Lei H. 2018. Application of deep convolutional neural network in object detection. Computer Engineering and Applications, 54(17): 1-9
姚群力, 胡显, 雷宏. 2018. 深度卷积神经网络在目标检测中的研究进展. 计算机工程与应用, 54(17): 1-9[DOI:10.3778/j.issn.1002-8331.1806-0377]
Zhao W Q, Zhou Z D and Zhai Y J. 2019. SSD small target detection algorithm based on deconvolution and feature fusion. CAAI Transactions on Intelligent Systems, 15(2): 310-316
赵文清, 周震东, 翟永杰. 2019. 基于反卷积和特征融合的SSD小目标检测算法. 智能系统学报, 15(2): 310-316[DOI:10.11992/tis.201905035]
相关作者
相关机构