结合关键点与引导向量的旋转目标检测网络

佘浩东; 赵良瑾

doi:10.11834/jig.230207

遥感图像处理 | 浏览量 : 0 下载量: 5 CSCD: 0

PDF
导出
分享
收藏
专辑

结合关键点与引导向量的旋转目标检测网络
Rotating target detection network that combines key points and guide vectors
2024年29卷第2期页码：533-544
纸质出版日期： 2024-02-16 ，
DOI： 10.11834/jig.230207
稿件说明：

移动端阅览

佘浩东，赵良瑾. 2024. 结合关键点与引导向量的旋转目标检测网络. 中国图象图形学报， 29(02):0533-0544

She Haodong， Zhao Liangjin. 2024. Rotating target detection network that combines key points and guide vectors. Journal of Image and Graphics， 29(02):0533-0544
佘浩东，赵良瑾. 2024. 结合关键点与引导向量的旋转目标检测网络. 中国图象图形学报， 29(02):0533-0544 DOI： 10.11834/jig.230207.

She Haodong， Zhao Liangjin. 2024. Rotating target detection network that combines key points and guide vectors. Journal of Image and Graphics， 29(02):0533-0544 DOI： 10.11834/jig.230207.

摘要

目的

目标检测是遥感智能解译中重要的研究方向之一，大多数目标检测算法难以实现密集排列的旋转目标的高精度检测。提出了一种基于关键点与引导向量预测的目标检测算法，实现高精度旋转目标检测的同时，还可对目标的朝向进行表征。

方法

首先提出了一种新的旋转目标建模方式，将目标检测分解成中心点、头部顶点、引导向量以及目标宽度的参数回归以更贴合检测目标；其次设计旋转椭圆高斯核，能够更好地拟合遥感目标的形状，从而提升关键点的预测精度；最后通过预测中心点指向头部顶点的引导向量，完成同一个目标内中心点与头部顶点的匹配，从而生成一个精准的带方向的旋转矩形检测框。

结果

在大长宽比舰船目标的HRSC（high-resolution ship collections）数据集上的实验结果表明，相比于其他主流的目标检测算法，本文算法获得了更好的检测结果，在VOC 2007（visual object classes）和VOC 2012的平均精度分别达到了90.78%和97.85%。在小长宽比飞机目标UCAS-AOD（UCAS-high resolution aerial object detection dataset）数据集上达到了98.81%的平均精度。实验结果表明了本文算法的可行性与有效性。

结论

本文算法利用椭圆高斯核计算中心点与头部顶点，并设计引导向量对点匹配关系进行约束，实现了旋转目标的方向检测。

Abstract

Objective

Optical remote sensing images objectively and accurately record the implementation of surface features and are widely used in the investigation， detection， analysis， and prediction forecasting of resources， environment， disasters， regions， and cities. The primary task of optical remote sensing image object detection is to locate and classify objects in the input remote sensing images with important values for research and application in the field of Earth observation. Traditional remote sensing object detection algorithms require manually designed features. However， features designed in this manner are limited， and consume considerable human and material resources but are not generalized and accurate enough to be improved. With the rapid development of deep learning in recent years， remote sensing object detection algorithms based on deep learning have achieved good results in optical image object detection. In contrast with object detection in natural scenes， objects in optical remote sensing images are rigid and most of them have key information， such as direction. Horizontal rectangular detection frames in natural scenes have problems in the field of optical remote sensing object detection， such as excessive background area， overlapping adjacent detection frames， and loss of object motion information. To achieve more accurate object detection in optical remote sensing images， a rotating rectangular frame that fits object contour is a more suitable choice. The detection of rotating remote sensing objects through the discovery of key points is one of the current mainstream approaches. However， these key point-based object detection algorithms tend to have problems， such as the overlapping of adjacent key points and inaccurate key point detection， due to the dense arrangement of remote sensing objects. To solve these key point regression problems， this study proposes an improved rotating elliptic Gaussian kernel with vector-guided point pair matching module， which achieves high-precision rotating object detection through the accurate prediction and matching of object centroids and head vertices.

Method

An hourglass network is different from the general feature extraction network， because its structure can fuse high-level features with rich semantic information and underlying features with rich spatial information. The generated high-resolution feature map can achieve the precise location of key points. The circular Gaussian kernel that returns key points in natural scenes exhibits the problems of uncertainty of Gaussian kernel radius and the overlapping of Gaussian kernels for densely arranged objects in remote sensing image object detection. The rotating elliptical Gaussian kernel proposed in this study solves the aforementioned problems. It is particularly constructed in such a way that the long and short axes of the elliptical Gaussian kernel are determined by the length and width of the rotating rectangular box of the object and the angle of the long axis of the ellipse is the same as the angle of the object. This rotated elliptical Gaussian kernel fits the shape of the object more closely， achieving better key point regression effect. In this study， the two key points of the object （i.e.， the center point and the head vertex） are modeled as the core， and a point pair matching module that uses bootstrap vectors is proposed to achieve the exact pairing of the center point and the head vertex of the same object.

Result

Our model is evaluated on the HRSC2016 and UCAS-AOD public datasets. The HRSC2016 dataset has 436 training images， 181 validation images， and 444 test images， with image sizes ranging from 300 × 300 to 1 500 × 900. The UCAS-AOD dataset has image sizes of 1 280 × 659， with 1 000 aircraft images and 510 vehicle images， including 7 482 aircraft objects and 7 114 vehicle objects. The annotations in the HRSC dataset contain the head vertices. The annotations of the aircraft category in the UCAS-AOD dataset contain the specific orientation angles of the objects， and thus， the head vertices of aircraft can be calculated. During the experiment， images of various sizes were cropped and deflated to 640 × 640 resolution and inputted into the network model. Four Nvidia RTX 2080Ti graphics cards were used， with a batch size of eight images per card and an initial learning rate set to 0.01. The optimizer for training was the stochastic gradient descent method with a momentum factor set to 0.9. Before training， the dataset was augmented through flipping and rotation. Recall， accuracy， and average precision are used as the evaluation metrics of the model. The experimental results on the HRSC dataset with large-aspect-ratio ship objects show that the proposed algorithm achieves better detection results than the other mainstream object detection algorithms， with an average accuracy of 90.78% （VOC 2007） and 97.85% （VOC 2012）， and the precision-recall curves are also better than those of the other object detection algorithms.

Conclusion

Our experimental results show that the rotating object detection model that combines key points and bootstrap vectors is excellent and advanced. The rotating elliptic Gaussian kernel achieves more accurate key point regression， and the point pair matching module based on bootstrap vectors achieves accurate matching of centroids and head vertices， improving the detection of rotating objects.

关键词

目标检测深度学习旋转椭圆高斯核引导向量方向检测

Keywords

object detectiondeep learningrotating elliptic Gaussian kernelguidance vectorsoriented detection

references

Chen H K and Luo H L. 2021. Multi-scale semantic information fusion for object detection. Journal of Electronics and Information Technology， 43（7）： 2087-2095

陈鸿坤，罗会兰. 2021. 多尺度语义信息融合的目标检测. 电子与信息学报， 43（7）： 2087-2095 ［DOI： 10.11999/JEIT200147http://dx.doi.org/10.11999/JEIT200147］

Chen Z M， Chen K A， Lin W Y， See J， Yu H， Ke Y and Yang C. 2020. PIoU loss： towards accurate oriented object detection in complex environments//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 195-211 ［DOI： 10.1007/978-3-030-58558-7_12http://dx.doi.org/10.1007/978-3-030-58558-7_12］

Cheng G， Si Y J， Hong H L， Yao X W and Guo L. 2021. Cross-scale feature fusion for object detection in optical remote sensing images. IEEE Geoscience and Remote Sensing Letters， 18（3）： 431-435 ［DOI： 10.1109/lgrs.2020.2975541http://dx.doi.org/10.1109/lgrs.2020.2975541］

Ding J， Xue N， Long Y， Xia G S and Lu Q K. 2019. Learning RoI Transformer for oriented object detection in aerial images//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 2844-2853 ［DOI： 10.1109/cvpr.2019.00296http://dx.doi.org/10.1109/cvpr.2019.00296］

Girshick R. 2015. Fast R-CNN//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago， Chile： IEEE： 1440-1448 ［DOI： 10.1109/iccv.2015.169http://dx.doi.org/10.1109/iccv.2015.169］

Girshick R， Donahue J， Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， USA： IEEE： 580-587 ［DOI： 10.1109/cvpr.2014.81http://dx.doi.org/10.1109/cvpr.2014.81］

Gong S R， Xu S J， Zhou L F， Zhu J and Zhong S. 2022. Deformable atrous convolution nearshore SAR small ship detection incorporating mixed attention. Journal of Image and Graphics， 27（12）： 3663-3676

龚声蓉，徐少杰，周立凡，朱杰，钟珊. 2022. 融入混合注意力的可变形空洞卷积近岸SAR小舰船检测. 中国图象图形学报， 27（12）： 3663-3676 ［DOI： 10.11834/jig.210866http://dx.doi.org/10.11834/jig.210866］

Guo H Y， Yang X， Wang N N and Gao X B. 2021. A CenterNet++ model for ship detection in SAR images. Pattern Recognition， 112： #107787 ［DOI： 10.1016/j.patcog.2020.107787http://dx.doi.org/10.1016/j.patcog.2020.107787］

Guo H Y， Yang X， Wang N N， Song B and Gao X B. 2020. A rotational libra R-CNN method for ship detection. IEEE Transactions on Geoscience and Remote Sensing， 58（8）： 5772-5781 ［DOI： 10.1109/tgrs.2020.2969979http://dx.doi.org/10.1109/tgrs.2020.2969979］

Guo W， Shen L， Qu H C， Wang Y X and Lin C. 2022. Ship detection in SAR images based on adaptive weight pyramid and branch strong correlation. Journal of Image and Graphics， 27（10）： 3127-3138

郭伟，申磊，曲海成，王雅萱，林畅. 2022. 自适应权重金字塔和分支强相关的SAR图像舰船检测. 中国图象图形学报， 27（10）： 3127-3138 ［DOI： 10.11834/jig.210373http://dx.doi.org/10.11834/jig.210373］

He X， Ma S P， He L Y and Ru L. 2022. High-resolution polar network for object detection in remote sensing images. IEEE Geoscience and Remote Sensing Letters， 19： 1-5 ［DOI： 10.1109/lgrs.2020.3039240http://dx.doi.org/10.1109/lgrs.2020.3039240］

Jiang Y Y， Zhu X Y， Wang X B， Yang S L， Li W， Wang H， Fu P and Luo Z B. 2018. R2 CNN： rotational region CNN for arbitrarily-oriented scene text detection//Proceedings of the 24th International Conference on Pattern Recognition. Beijing， China： IEEE： 3610-3615 ［DOI： 10.1109/icpr.2018.8545598http://dx.doi.org/10.1109/icpr.2018.8545598］

Law H and Deng J. 2018. CornerNet： detecting objects as paired keypoints//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 765-781 ［DOI： 10.1007/978-3-030-01264-9_45http://dx.doi.org/10.1007/978-3-030-01264-9_45］

Li C Z， Xu C Y， Cui Z， Wang D， Zhang T and Yang J. 2019. Feature-attentioned object detection in remote sensing imagery//Proceedings of 2019 IEEE International Conference on Image Processing （ICIP）. Taipei， China： IEEE： 3886-3890 ［DOI： 10.1109/icip.2019.8803521http://dx.doi.org/10.1109/icip.2019.8803521］

Lin T Y， Goyal P， Girshick R， He K M and Doll􀅡r P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 2999-3007 ［DOI： 10.1109/iccv.2017.324http://dx.doi.org/10.1109/iccv.2017.324］

Liu W， Anguelov D， Erhan D， Szegedy C， Reed S， Fu C Y and Berg A C. 2016. SSD： single shot MultiBox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 21-37 ［DOI： 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2］

Ma J Q， Shao W Y， Ye H， Wang L， Wang H， Zheng Y B and Xue X Y. 2018. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia， 20（11）： 3111-3122 ［DOI： 10.1109/tmm.2018.2818020http://dx.doi.org/10.1109/tmm.2018.2818020］

Newell A， Yang K Y and Deng J. 2016. Stacked hourglass networks for human pose estimation//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 483-499 ［DOI： 10.1007/978-3-319-46484-8_29http://dx.doi.org/10.1007/978-3-319-46484-8_29］

Nie G T and Huang H. 2021. A survey of object detection in optical remote sensing images. Acta Automatica Sinica， 47（8）： 1749-1768

聂光涛，黄华. 2021. 光学遥感图像目标检测算法综述. 自动化学报， 47（8）： 1749-1768 ［DOI： 10.16383/j.aas.c200596http://dx.doi.org/10.16383/j.aas.c200596］

Qian W， Yang X， Peng S L， Yan J C and Guo Y. 2021. Learning modulated loss for rotated object detection//Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto， USA： AAAI： 2458-2466 ［DOI： 10.1609/aaai.v35i3.16347http://dx.doi.org/10.1609/aaai.v35i3.16347］

Ruan C， Guo H and An J B. 2021. SAR inshore ship detection algorithm in complex background. Journal of Image and Graphics， 26（5）： 1058-1066

阮晨，郭浩，安居白. 2021. 复杂背景下SAR近岸舰船检测. 中国图象图形学报， 26（5）： 1058-1066 ［DOI： 10.11834/jig.00266http://dx.doi.org/10.11834/jig.00266］

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations. San Diego， USA： ICLR： 1-14

Wei H R， Zhang Y， Wang B， Yang Y， Li H and Wang H Q. 2021. X-LineNet： detecting aircraft in remote sensing images by a pair of intersecting line segments. IEEE Transactions on Geoscience and Remote Sensing， 59（2）： 1645-1659 ［DOI： 10.1109/tgrs.2020.2999082http://dx.doi.org/10.1109/tgrs.2020.2999082］

Xie X X， Cheng G， Wang J B， Yao X W and Han J W. 2021. Oriented R-CNN for object detection//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 3500-3509 ［DOI： 10.1109/iccv48922.2021.00350http://dx.doi.org/10.1109/iccv48922.2021.00350］

Xu Y C， Fu M T， Wang Q M， Wang Y K， Chen K， Xia G S and Bai X. 2021. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence， 43（4）： 1452-1459 ［DOI： 10.1109/tpami.2020.2974745http://dx.doi.org/10.1109/tpami.2020.2974745］

Yang X， Sun H， Fu K， Yang J R， Sun X， Yan M L and Guo Z. 2018. Automatic ship detection in remote sensing images from Google earth of complex scenes based on multiscale rotation dense feature pyramid networks. Remote Sensing， 10（1）： #132 ［DOI： 10.3390/rs10010132http://dx.doi.org/10.3390/rs10010132］

Yang X， Yan J C， Feng Z M and He T. 2021a. R3Det： refined single-stage detector with feature refinement for rotating object//Proceedings of the 35th AAAI Conference on Artificial Intelligence. AAAI： 3163-3171 ［DOI： 10.1609/aaai.v35i4.16426http://dx.doi.org/10.1609/aaai.v35i4.16426］

Yang X， Yan J C， Liao W L， Yang X K， Tang J and He T. 2023. SCRDet+<math id="M16"><mo>+</mo></math>https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=53829444&type=https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=53829440&type=1.354666712.03200006： detecting small， cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Transactions on Pattern Analysis and Machine Intelligence， 45（2）： 2384-2399 ［DOI： 10.1109/tpami.2022.3166956http://dx.doi.org/10.1109/tpami.2022.3166956］

Yang X， Yan J C， Ming Q， Wang W T， Zhang X P and Tian Q. 2021b. Rethinking rotated object detection with gaussian wasserstein distance loss//Proceedings of the 38th International Conference on Machine Learning. PMLR： 11830-11841 ［DOI： 10.48550/arXiv.2101.11952http://dx.doi.org/10.48550/arXiv.2101.11952］

Yang X， Yang J R， Yan J C， Zhang Y， Zhang T F， Guo Z， Sun X and Fu K. 2019. SCRDet： towards more robust detection for small， cluttered and rotated objects//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea（South）： IEEE： 8231-8240 ［DOI： 10.1109/iccv.2019.00832http://dx.doi.org/10.1109/iccv.2019.00832］

Yi J R， Wu P X， Liu B， Huang Q Y， Qu H and Metaxas D. 2021. Oriented object detection in aerial images with box boundary-aware vectors//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa， USA： IEEE： 2149-2158 ［DOI： 10.1109/wacv48630.2021.00220http://dx.doi.org/10.1109/wacv48630.2021.00220］

Zhang X， Yang X， Yang D， Wang F and Gao X B. 2022. A universal ship detection method with domain-invariant representations. IEEE Transactions on Geoscience and Remote Sensing， 60： #5629311 ［DOI： 10.1109/tgrs.2022.3200957http://dx.doi.org/10.1109/tgrs.2022.3200957］

Zhu Z C， Sun X， Diao W H， Chen K Q， Xu G L and Fu K. 2022. Invariant structure representation for remote sensing object detection based on graph modeling. IEEE Transactions on Geoscience and Remote Sensing， 60： #5625217 ［DOI： 10.1109/tgrs.2022.3181686http://dx.doi.org/10.1109/tgrs.2022.3181686］

文章被引用时，请邮件提醒。

提交

融合帧间时序关系的标准胎儿四腔心超声切面自动获取