融合多特征与自注意力的室外直升机桨叶旋转目标检测
Fusion of multi-feature and self-attention for rotating target detection of outdoor helicopter blades
- 2025年30卷第1期 页码:240-253
纸质出版日期: 2025-01-16
DOI: 10.11834/jig.230693
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2025-01-16 ,
移动端阅览
徐飞龙, 熊邦书, 欧巧凤, 余磊, 饶智博. 融合多特征与自注意力的室外直升机桨叶旋转目标检测[J]. 中国图象图形学报, 2025,30(1):240-253.
XU FEILONG, XIONG BANGSHU, OU QIAOFENG, YU LEI, RAO ZHIBO. Fusion of multi-feature and self-attention for rotating target detection of outdoor helicopter blades. [J]. Journal of image and graphics, 2025, 30(1): 240-253.
目的
2
桨叶运动参数是直升机设计到生产的重要指标,传统的视觉测量方法直接应用于室外环境下,由于受复杂光照背景影响,存在找不到桨叶区域、不能进行准确测量的问题。据此,本文提出一种融合多特征与自注意力的旋转目标检测器(fusion multi-feature and self-attention rotating detector,FMSA-RD)。
方法
2
首先,针对YOLOv5s(you only look once)特征提取能力不足和冗余问题,在主干网络中设计了更为有效的多特征提取和融合模块,结合不同时刻位置与尺度下的特征信息以提高网络对室外桨叶的检测精度;并去掉部分无关卷积层以简化模块结构参数。其次,融合多头自注意力机制与CSP(crossstage partial convolution)瓶颈结构,整合全局信息以抑制室外复杂光照背景干扰。最后,引入倾斜交并比(skew intersection over union,SKEWIOU)损失和角度损失,改进损失函数,进一步提升桨叶检测精度。
结果
2
本文进行了多组对比实验,分别在自制的室外直升机桨叶数据集和公共数据集DOTA-v1.0(dataset for object detection in aerial images)上进行验证,对比基线YOLOv5s目标检测网络,本文模型平均精度均值(mean average precision,mAP)分别提高6.6%和12.8%,帧速率(frames per second,FPS)分别提高21.8%和47.7%。
结论
2
本文设计的旋转目标检测模型,提升了室外复杂光照背景下桨叶的检测精度和速度。
Objective
2
The motion parameters of helicopter rotor blades include flapping angle, lead-lag angle, twist angle, and coning angle, which provide an important basis for rotor structure design, upper and lower limit block design of hub, and blade load design. They are important parameters that need to be measured in ground tests before rotorcraft certification and helicopter flight tests. The traditional visual measurement method for rotor blade motion parameters has achieved good results in indoor wind tunnel environments. However, under the influence of complex outdoor backgrounds, some problems exist, such as the inability to detect the rotor blades from the image and measure the parameters accurately. Unlike indoor environments, the outdoors have complex lighting conditions such as different seasons, weather, times, and lighting directions, as well as different sky and background environments. Under these complex background interferences, the features of the rotor blades are weakened, making it difficult to accurately locate the position of the rotor blades. Deep learning is a mainstream object detection method, and how to design deep learning models to enhance the target features of rotor blades and reduce the interference of complex backgrounds is a major challenge. In this paper, on the basis of the network structure of the YOLOv5s, a prediction angle branch is added, and a rotation object detector (FMSA-RD) that fuses multiple features and self-attention is proposed to facilitate the detection of outdoor helicopter rotor blades.
Method
2
First, the FMSA-RD model improves the C3 module used for feature extraction in YOLOv5s by adding multiple shortcuts. The improved feature extraction module is called convolution five (C5), which completes the feature extraction by fusing local features from different positions, thereby reducing the network structure while maintaining the feature extraction ability. Specifically, C5 replaces the BottleNeck module in C3 with two 3 × 3 convolution kernels to avoid the additional overhead caused by using multiple BottleNeck modules and increase the receptive field of the convolution layer feature map. Increasing the number of convolution layers does not necessarily lead to optimized parameters and may cause gradient divergence and network degradation. C5 adds shortcut branches to three main convolution layers to effectively avoid the accumulation of useless parameters and extract feature information from different positions. Secondly, the multi-feature fusion spatial pyramid pooling cross stage partial fast tiny module enhances the ability to fuse graphic features at different scales. This module uses a block merging method, using multiple serial 5 × 5 MaxPool layers to extract four different receptive fields, which improves detection accuracy while keeping the model lightweight. Then, to address the weak ability of convolutional neural network (CNN) structures to connect global features, the B3 module is designed to improve the extraction ability of global features by combining the multi-head self-attention mechanism of Transformer with the crossstage partial convolution(CSP) bottleneck structure, which suppresses the influence of complex outdoor rotor blade backgrounds. Finally, skew intersection over union (SKEWIOU) tilt intersection ratio loss and angle loss are introduced to improve the loss function and further enhance the accuracy of blade detection.
Result
2
Our experiments were conducted on a self-made outdoor helicopter rotor blade dataset, a self-made outdoor simulated blade dataset, and the public dataset DOTA-v1.0 for training and validation. The self-made outdoor rotor blade dataset contains 3 878 images, which were randomly divided into training, testing, and validation sets in a 7:2:1 ratio. Our FMSA-RD was compared with mainstream horizontal and rotational models such as RetinaNet, FCOS, YOLOv5s, YOLOv6s, YOLOv7 tiny, CenterRot, FAB + DRB + CRB, H2RBox, and R3Det. Experimental results show that our method achieves an average detection accuracy of 98.5% and 110.5 frames per second. On the basis of the comparison experiments using a self-made outdoor blade dataset, the analysis indicates that the mean average precision (mAP) of FMSA-RD is 14.1%, 7.8%, 6.6%, 3.2%, 3.9%, 3.0%, 3.1%, 2.3%, and 4.2% higher than those of RetinaNet, FCOS, YOLOv5, YOLOv6s, YOLOv7 tiny, CenterRot, FAB + DRB + CRB, H2RBox, and R3Det, respectively. The public dataset DOTA contains 2 806 remote sensing images with a resolution of 800 × 800, covering various scene types such as cities, industrial areas, buildings, and roads. This comparative experiment aims to verify the generalization ability of the FMSA-RD network. We chose mainstream rotating object detection models for comparative experiments. On the self-made outdoor simulated paddle dataset, the data in the morning and noon are used as the training set, and the data at night are used as the validation set. Experiments show that FMSA-RD has low computational complexity, high detection accuracy, and good generalization ability, making it suitable for different scenarios and environments.
Conclusion
2
Our FMSA-RD can reduce complexity while integrating local feature information from different positions, suppressing complex background noise interference. The fusion of different scale features improves the accuracy of blade detection. The fusion of self-attention mechanism extracts global information and distinguishes blades without circular markers, achieving accurate detection of complex backgrounds and high aspect ratio blades while reducing model parameters and improving detection accuracy.
室外直升机桨叶旋转目标检测多特征多头自注意力机制(MHSA)损失函数
outdoor helicopter rotor bladerotating target detectionmulti-featuremulti-head self-attention mechanism(MHSA)loss function
Cheng G, Wang J B, Li K, Xie X X, Lang C B, Yao Y Q and Han J W. 2022. Anchor-free oriented proposal generator for object detection. IEEE Transactions on Geoscience and Remote Sensing, 60: #5625411 [DOI: 10.1109/TGRS.2022.3183022http://dx.doi.org/10.1109/TGRS.2022.3183022]
Ding J, Xue N, Long Y, Xia G S and Lu Q K. 2019. Learning ROI Transformer for oriented object detection in aerial images//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2844-2853 [DOI: 10.1109/CVPR.2019.00296http://dx.doi.org/10.1109/CVPR.2019.00296]
Du Y L, Liu Q Q, Wang L L, Xu X, Wei Q M and Song W. 2022. Multi-scale rotating anchor mechanism based automatic detection of ocean mesoscale eddy. Journal of Image and Graphics, 27(10): 3092-3101
杜艳玲, 刘倩倩, 王丽丽, 徐鑫, 魏泉苗, 宋巍. 2022. 融合多尺度旋转锚机制的海洋中尺度涡自动检测. 中国图象图形学报, 27(10): 3092-3101 [DOI: 10.11834/jig.210286http://dx.doi.org/10.11834/jig.210286]
Han J M, Ding J, Xue N and Xia G S. 2021. ReDet: a rotation-equivariant detector for aerial object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 2785-2794 [DOI: 10.1109/CVPR46437.2021.00281http://dx.doi.org/10.1109/CVPR46437.2021.00281]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Huang J C, Wang H and Lu H. 2022. One-stage detectors combining lightweight backbone and multi-scale fusion. Journal of Image and Graphics, 27(12): 3596-3607
黄健宸, 王晗, 卢昊. 2022. 结合轻量化骨干与多尺度融合的单阶段检测器. 中国图象图形学报, 27(12): 3596-3607 [DOI: 10.11834/jig.211028http://dx.doi.org/10.11834/jig.211028]
Jiang Y Y, Zhu X Y, Wang X B, Yang S L, Li W, Wang H, Fu P and Luo Z B. 2017. R2CNN: rotational region CNN for orientation robust scene text detection [EB/OL]. [2023-09-21]. https://arxiv.org/pdf/1706.09579.pdfhttps://arxiv.org/pdf/1706.09579.pdf
Kim B, Lee J, Lee S, Lee S, Kim D and Kim J. 2022. TricubeNet: 2D kernel-based object representation for weakly-occluded oriented object detection//Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE: 3421-3430 [DOI: 10.1109/WACV51458.2022.00348http://dx.doi.org/10.1109/WACV51458.2022.00348]
Kim G N, Kim S H, Joo I, Kim G B and Yoo K H. 2023. Center deviation measurement of color contact lenses based on a deep learning model and hough circle transform. Sensors, 23(14): #6533 [DOI: 10.3390/s23146533http://dx.doi.org/10.3390/s23146533]
Li C Y, Li L L, Jiang H L, Weng K H, Geng Y F, Li L, Ke Z D, Li Q Y, Cheng M, Nie W Q, Li Y D, Zhang B, Liang Y F, Zhou L Y, Xu X M, Chu X X, Wei X M and Wei X L. 2022. YOLOv6: a single-stage object detection framework for industrial applications [EB/OL]. [2023-09-21]. https://doi.org/10.48550/arXiv.2209.02976https://doi.org/10.48550/arXiv.2209.02976
Lin T Y, Goyal P, Girshick R, He K M and Dollr P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2999-3007 [DOI: 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324]
Ma J Q, Shao W Y, Ye H, Wang L, Wang H, Zheng Y B and Xue X Y. 2018. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 20(11): 3111-3122 [DOI: 10.1109/TMM.2018.2818020http://dx.doi.org/10.1109/TMM.2018.2818020]
Ou Q F, Xiao J B, Chen G F, Li X M and Xiong B S. 2021. Full-scene measurement and analysis of helicopter blade flaps based on vision. Chinese Journal of Scientific Instrument, 42(1): 146-156
欧巧凤, 肖佳兵, 陈垚锋, 李新民, 熊邦书. 2021. 直升机桨叶挥舞量的全场景视觉测量及分析. 仪器仪表学报, 42(1): 146-156 [DOI: 10.19650/j.cnki.cjsi.J2006688http://dx.doi.org/10.19650/j.cnki.cjsi.J2006688]
Qian W, Yang X, Peng S L, Zhang X J and Yan J C. 2022. RSDe-t++: point-based modulated loss for more accurate rotated object detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(11): 7869-7879 [DOI: 10.1109/TCSVT.2022.3186070http://dx.doi.org/10.1109/TCSVT.2022.3186070]
Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149 [DOI: 10.1109/TPAMI.2016.2577031http://dx.doi.org/10.1109/TPAMI.2016.2577031]
Tian Z, Shen C H, Chen H and He T. 2020. FCOS: fully convolutional one-stage object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 9626-9635 [DOI: 10.1109/ICCV.2019.00972http://dx.doi.org/10.1109/ICCV.2019.00972]
Ultralytics. 2020. Yolov5 [EB/OL]. [2023-09-21]. https://github.com/ultralytics/yolov5https://github.com/ultralytics/yolov5
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin L. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 6000-6010
Wang C Y, Bochkovskiy A and Liao H Y M. 2023. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 7464-7475 [DOI: 10.1109/CVPR52729.2023.00721http://dx.doi.org/10.1109/CVPR52729.2023.00721]
Wang C Y, Liao H Y M, Wu Y H, Chen P Y, Hsieh J W and Yeh I H. 2020. CSPNet: a new backbone that can enhance learning capability of CNN//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Seattle, USA: IEEE: 1571-1580 [DOI: 10.1109/CVPRW50498.2020.00203http://dx.doi.org/10.1109/CVPRW50498.2020.00203]
Wang J, Yang L and Li F. 2021. Predicting arbitrary-oriented objects as points in remote sensing images. Remote Sensing, 13(18): #3731 [DOI: 10.3390/rs13183731http://dx.doi.org/10.3390/rs13183731]
Xia G S, Bai X, Ding J, Zhu Z, Belongie S, Luo J B, Datcu M, Pelillo M and Zhang L P. 2018. DOTA: a large-scale dataset for object detection in aerial images//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3974-3983 [DOI: 10.1109/CVPR.2018.00418http://dx.doi.org/10.1109/CVPR.2018.00418]
Xie X X, Cheng G, Wang J B, Yao X W and Han J W. 2021. Oriented R-CNN for object detection//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 3500-3509 [DOI: 10.1109/ICCV48922.2021.00350http://dx.doi.org/10.1109/ICCV48922.2021.00350]
Yang X, Yan J C, Feng Z M and He T. 2021. R3Det: refined single-stage detector with feature refinement for rotating object//Proceedings of the 35th AAAI Conference on Artificial Intelligence. [s.l.]: AAAI: 3163-3171 [DOI: 10.1609/aaai.v35i4.16426http://dx.doi.org/10.1609/aaai.v35i4.16426]
Yang X, Zhang G F, Li W T, Zhou Y, Wang X H and Yan J C. 2022. H2RBox: horizontal box annotation is all you need for oriented object detection//Proceedings of the 11th International Conference on Learning Representations. Kigali, Rwanda: ICLR: 1-14
Yuan Y, Li Z G and Ma D D. 2022. Feature-aligned single-stage rotation object detection with continuous boundary. IEEE Transactions on Geoscience and Remote Sensing, 60: #5538011 [DOI: 10.1109/TGRS.2022.3203983http://dx.doi.org/10.1109/TGRS.2022.3203983]
Zheng Z H, Wang P, Liu W, Li J Z, Ye R G and Ren D W. 2020. Distance-IoU loss: faster and better learning for bounding box regression//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 12993-13000 [DOI: 10.1609/aaai.v34i07.6999http://dx.doi.org/10.1609/aaai.v34i07.6999]
Zhou Q and Yu C H. 2022. Point RCNN: an angle-free framework for rotated object detection. Remote Sensing, 14(11): #2605 [DOI: 10.3390/rs14112605http://dx.doi.org/10.3390/rs14112605]
Zuo C L, Wei C H, Ma J, Yue T R, Liu L and Shi Z Y. 2021. Full-field displacement measurements of helicopter rotor blades using stereophotogrammetry. International Journal of Aerospace Engineering, 2021: #8811601 [DOI: 10.1155/2021/8811601http://dx.doi.org/10.1155/2021/8811601]
相关作者
相关机构