融合边界框高斯建模与特征聚合分发的遥感飞机细粒度识别
Fusion of bounding box Gaussian modeling and feature aggregation distribution for fine-grained recognition of remote sensing aircraft images
- 2025年30卷第1期 页码:282-296
纸质出版日期: 2025-01-16
DOI: 10.11834/jig.230862
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2025-01-16 ,
移动端阅览
王晓燕, 梁文辉, 李杰, 牟建宏, 王禧钰. 融合边界框高斯建模与特征聚合分发的遥感飞机细粒度识别[J]. 中国图象图形学报, 2025,30(1):282-296.
WANG XIAOYAN, LIANG WENHUI, LI JIE, MU JIANHONG, WANG XIYU. Fusion of bounding box Gaussian modeling and feature aggregation distribution for fine-grained recognition of remote sensing aircraft images. [J]. Journal of image and graphics, 2025, 30(1): 282-296.
目的
2
遥感飞机影像由于目标尺寸差距大,采集过程中受光照、遮挡等因素的影响,导致不同型号飞机特征相似,小目标检测效果不好,类内无法实现细粒度区分。为了解决上述问题,提出了一种融合边界框高斯建模与特征聚合分发的YOLOv5s(you only look once)遥感飞机细粒度识别算法。
方法
2
首先,将归一化高斯瓦瑟斯坦距离(normalized Gaussian Wasserstein distance, NWD)与交并比(intersection over union,IoU)及其衍生指标相结合,并合理地设置两者的比例参数,改进YOLOv5s位置损失的计算方式,从而提高模型对小目标的敏感度。其次,在YOLOv5s的Neck部分引入特征聚合分发模块(gather-and-distribute, GD),在原网络“自顶向下、横向连接”的特征融合方式的基础上做到了跨层融合信息,增强网络细粒度特征、全局特征提取能力,提高了整体检测精度。为检验本算法在军用飞机上的细粒度和小目标识别优势,使用遥感飞机细粒度数据集MAR20(military aircraft recognition 20)和遥感飞机小目标数据集CORS-ADD(complex optical remote-sensing aircraft detection dataset)进行实验。
结果
2
实验结果显示:对于数据集MAR20和CORS-ADD,本文模型精确度分别达到了99.10%和95.36%,与原YOLOv5s、YOLOv8s、Gold-YOLO和Faster R-CNN相比,检测精度最佳。实验验证了模型在细粒度和小目标检测方面性能更加优秀,在检测结果上与真实结果更加接近,改进算法细粒度和小目标检测精度最佳。
结论
2
实验结果表明,本文算法在检测性能和模型精度上的表现优于上述4种目标检测算法,模型具有良好的实用价值。
Objective
2
As a basic branch of computer vision, object detection plays an important role in subsequent tasks such as image segmentation and object tracking. It aims to find all the objects in the image and determine the location and category of the objects. It is used in industrial testing and has profound and extensive applications in aerospace, autonomous driving, and other fields. Aircraft detection in remote sensing images is of great significance to both military and civilian fields such as air traffic control and battlefield dynamic monitoring. As a result of the large differences in object size in remote sensing aircraft images, the acquisition process is affected by factors such as lighting and occlusion, resulting in similar characteristics of different types of aircraft, poor detection of small objects, and the inability to achieve fine-grained distinction within categories. In object detection, the loss function is used to measure the difference between the model prediction and the actual object, which directly affects the performance and convergence speed of the model. Adjusting the model parameters so that the value of the loss function reaches the minimum value can improve the accuracy of the model in the test set. The loss function of YOLOv5 consists of position loss, category loss and confidence loss. YOLOv5 uses the intersection over union (IoU) and the derivative algorithm complete IoU by default, and provides IoU, generalized IoU, and distance IoU for replacement. However, for small object detection, especially with anchor box-based algorithms such as YOLOv5, the IoU series indicators cannot meet application needs well. Different types of remote sensing aircraft have fine-grained characteristics, which are reflected in subtle differences between classes, large differences within classes, and detail accuracy within classes. For fine-grained recognition tasks, extracting local information is crucial. The feature fusion module PANet used by YOLOv5s cannot achieve global feature fusion and is not conducive to extracting fine-grained features. To solve the above problems, this article proposes a model improvement algorithm based on YOLOv5s.
Method
2
In view of the shortcomings of IoU in small object detection based on YOLOv5, this article introduces Gaussian Wasserstein distance into the calculation of bounding box overlap to improve the detection performance of the network. Different from the IoU series of algorithms that calculate the similarity between different prediction boxes and real boxes based on the set of pixels contained in the bounding box, the Gaussian Wasserstein distance abandons the set, models the bounding box as a two-dimensional Gaussian distribution, and proposes a new metric called normalized Gaussian Wasserstein distance to calculate the similarity between frames, which fundamentally solves the problem of IoU in small object detection based on YOLOv5. In response to PANet’s shortcomings in fine-grained detection, this article introduces the gather-and-distribute feature aggregation module in Gold-YOLO into YOLOv5s to enhance the YOLOv5s network’s ability to extract fine-grained features through convolution and self-attention mechanisms. 1) The method combining Gaussian Wasserstein distance and traditional IoU is used to improve the loss function of YOLOv5s. 2) The gather-and-distribute feature aggregation module is introduced in the neck part of YOLOv5s to enhance the network’s local feature extraction capabilities. Through the above two methods, the overall detection accuracy is improved. To test the advantages of this algorithm in fine-grained and small object recognition on military aircraft, this paper uses the remote sensing aircraft fine-grained classification dataset MAR20 and the remote sensing aircraft small object dataset CORS-ADD to conduct experiments. In the field of remote sensing military aircraft identification, different types of aircraft often have similar characteristics, resulting in different types of aircraft having similar characteristics, making it difficult to achieve intra-class identification. This article uses the open-source object detection remote sensing image dataset military aircraft recognition 20(MAR20) to achieve fine-grained recognition of remote sensing military aircraft. The dataset contains a total of 3 842 images, including 20 military aircraft models (SU-35, C-130, C-17, C-5, F-16, TU-160, E-3, B-52, P-3C, B-1B, E-8, TU-22, F-15, KC-135, F-22, FA-18, TU-95, KC-10, SU-34, SU-24). The CORS-ADD dataset is a complex optical remote sensing aircraft small object dataset that is manually annotated and constructed by the Space Optical Engineering Research Center of Harbin Institute of Technology. It contains a total of 7 337 images, including 32 285 aircraft instances, and the object size ranges from 4 × 4 pixels to 240 × 240 pixels. Different from the single data source of previous remote sensing datasets, the CORS-ADD dataset comes from satellite platforms such as Google Maps, WorldView-2, WorldView-3, Pleiades, Jilin-1, and IKONOS, covering airports, aircraft carriers, oceans, land, and other scenarios, as well as aircraft objects such as bombers, fighter jets, and early-warning aircraft at typical airports in China and the United States.
Result
2
To test the algorithm improvement effect of the two improved modules on remote sensing aircraft recognition based on YOLOv5s, this article compares the model performance of the original YOLOv5s with the introduction of normalized Gaussian Wasserstein distance(NWD) (
r
is the weight parameter used to adjust the ratio of IoU and NWD) and GD. The experimental result shows that the introduction of NWD and GD can improve the recognition accuracy to varying degrees, and the improvements are effective. When the ratio of IoU to NWD is 1:1, the recognition effect of the MAR20 dataset is the best; when the ratio of IoU to NWD is 1:9, the recognition effect of the CORS-ADD dataset is the best. Experimental results show the following: For the MAR20 dataset, compared with that of YOLOv5s, YOLOv8s, and Gold-YOLO, the mAP of improved YOLOv5s in
creased by 1.1%, 0.7% and 1.8% respectively; for the CORS-ADD dataset, mAP increased by 0.6%, 1.7%, and 3.9%, respectively.
Conclusion
2
An improved YOLOv5s network is proposed to solve the problems of large object size differences and high intra-class similarity in the process of remote sensing aircraft image recognition. On the basis of YOLOv5s, the loss function of YOLOv5s is improved by combining the Gaussian Wasserstein distance with the traditional IoU metric, which improves the detection effect of objects of different sizes, thereby improving the detection accuracy of the model. At the same time, to solve the problem of the characteristics of different types of aircraft being similar and the difficulty of distinguishing between sub-categories, this article uses the gather-and-distribute feature aggregation module in Gold-YOLO to enhance the ability of the YOLOv5s network to extract fine-grained features. A comparison shows that the improved YOLOv5s has a better model detection accuracy than that of YOLOv5s, YOLOv8s, Gold-YOLO, and Faster R-CNN. To improve the image processing speed of the model without reducing the accuracy of the model and to reduce the consumption of computing resources as much as possible to achieve lightweight deployment in the future, this article will consider using the C3_DSConv network to replace the C3 network of the YOLOv5s detection part to improve the model check speed and make it lightweight.
目标检测改进YOLOv5s遥感飞机影像细粒度识别特征融合
object detectionimproved YOLOv5sremote sensing aircraft imageryfine-grained recognitionfeature fusion
Gai R L, Chen N and Yuan H. 2023. A detection algorithm for cherry fruits based on the improved YOLO-v4 model. Neural Computing and Applications, 35(19): 13895-13906 [DOI: 10.1007/s00521-021-06029-zhttp://dx.doi.org/10.1007/s00521-021-06029-z]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hua Y R, Zhang Z, Long S and Zhang Q L. 2020. Remote sensing image target detection based on improved YOLO algorithm. Electronic Measurement Technology, 43(24): 87-92
化嫣然, 张卓, 龙赛, 张青林. 2020. 基于改进YOLO算法的遥感图像目标检测. 电子测量技术, 43(24): 87-92 [DOI: 10.19651/j.cnki.emt.2005268http://dx.doi.org/10.19651/j.cnki.emt.2005268]
Huang J, Jiang Z G, Zhang H P and Yao Y. 2017. Ship object detection in remote sensing images using convolutional neural networks. Journal of Beijing University of Aeronautics and Astronautics, 43(9): 1841-1848
黄洁, 姜志国, 张浩鹏, 姚远. 2017. 基于卷积神经网络的遥感图像舰船目标检测. 北京航空航天大学学报, 43(9): 1841-1848 [DOI: 10.13700/j.bh.1001-5965.2016.0755http://dx.doi.org/10.13700/j.bh.1001-5965.2016.0755]
Li X, Liu S N, Yang Z and Wang K K. 2021. Multi-target detection of transmission lines based on improved cascade R-CNN. Journal of Electronic Measurement and Instrumentation, 35(10): 24-32
李鑫, 刘帅男, 杨桢, 王珂珂. 2021. 基于改进Cascade R-CNN的输电线路多目标检测. 电子测量与仪器学报, 35(10): 24-32 [DOI: 10.13382/j.jemi.B2104058http://dx.doi.org/10.13382/j.jemi.B2104058]
Lin T Y, Dollr P, Girshick R, He K M, Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection [EB/OL]. [2017-04-19]. http://arxiv.org/pdf/1612.03144.pdfhttp://arxiv.org/pdf/1612.03144.pdf
Liu Q L, Jiao B L and Liu L. 2009. On remote sensing image classification method based on improved BP neural network model. Electronics Optics and Control, 16(8): 65-67
刘钦龙, 焦斌亮, 刘立. 2009. 基于改进的BP神经网络模型的遥感图像分类方法研究. 电光与控制, 16(8): 65-67 [DOI: 10.3969/j.issn.1671-637X.2009.08.016http://dx.doi.org/10.3969/j.issn.1671-637X.2009.08.016]
Liu S, Qi L, Qin H F, Shi J P and Jia J Y. 2018. Path aggregation network for instance segmentation [EB/OL]. [2023-12-19]. http://arxiv.org/pdf/1803.01534.pdfhttp://arxiv.org/pdf/1803.01534.pdf
Liu Y, Fu Z Y and Zheng F B. 2015. Review on high resolution remote sensing image classification and recognition. Journal of Geo-Information Science, 17(9): 1080-1091
刘扬, 付征叶, 郑逢斌. 2015. 高分辨率遥感影像目标分类与识别研究进展. 地球信息科学学报, 17(9): 1080-1091 [DOI: 10.3724/SP.J.1047.2015.01080http://dx.doi.org/10.3724/SP.J.1047.2015.01080]
Minaee S, Boykov Y Y, Porikli F, Plaza A J, Kehtarnavaz N and Terzopoulos D. 2022. Image segmentation using deep learning: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7): 3523-3542 [DOI: 10.1109/TPAMI.2021.3059968http://dx.doi.org/10.1109/TPAMI.2021.3059968]
Oh M and Park H M. 2011. Blind source separation based on independent vector analysis using feed-forward network. Neurocomputing, 74(17): 3713-3715 [DOI: 10.1016/j.neucom.2011.06.008http://dx.doi.org/10.1016/j.neucom.2011.06.008]
Pan X Y, Jia N X, Mu Y Z and Gao X R. 2023. Survey of small object detection. Journal of Image and Graphics, 28(9): 2587-2615
潘晓英, 贾凝心, 穆元震, 高炫蓉. 2023. 小目标检测研究综述. 中国图象图形学报, 28(9): 2587-2615 [DOI: 10.11834/jig.220455http://dx.doi.org/10.11834/jig.220455]
Pang Y W, Zhang K, Yuan Y and Wang K Q. 2014. Distributed object detection with linear SVMs. IEEE Transactions on Cybernetics, 44(11): 2122-2133 [DOI: 10.1109/TCYB.2014.2301453http://dx.doi.org/10.1109/TCYB.2014.2301453]
Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 779-788 [DOI: 10.1109/cvpr.2016.91http://dx.doi.org/10.1109/cvpr.2016.91]
Shelhamer E, Long J and Darrell T. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4): 640-651 [DOI: 10.1109/TPAMI.2016.2572683http://dx.doi.org/10.1109/TPAMI.2016.2572683]
Shi T J, Gong J N, Jiang S K, Zhi X Y, Bao G Z, Sun Y and Zhang W. 2023. Complex optical remote-sensing aircraft detection dataset and benchmark. IEEE Transactions on Geoscience and Remote Sensing, 61: #5612309 [DOI: 10.1109/TGRS.2023.3283137http://dx.doi.org/10.1109/TGRS.2023.3283137]
Shi W X, Tan D L and Bao S L. 2020. Feature enhancement SSD algorithm and its application in remote sensing images target detection. Acta Photonica Sinica, 49(1): #0128002
史文旭, 谭代伦, 鲍胜利. 2020. 特征增强SSD算法及其在遥感目标检测中的应用. 光子学报, 49(1): #0128002 [DOI: 10.3788/gzxb20204901.0128002http://dx.doi.org/10.3788/gzxb20204901.0128002]
Shi Z H, Wu C W, Li C J, You Z Z, Wang Q and Ma C C. 2023. Object detection techniques based on deep learning for aerial remote sensing images: a survey. Journal of Image and Graphics, 28(9): 2616-2643
石争浩, 仵晨伟, 李成建, 尤珍臻, 王泉, 马城城. 2023. 航空遥感图像深度学习目标检测技术研究进展. 中国图象图形学报, 28(9): 2616-2643 [DOI: 10.11834/jig.221085http://dx.doi.org/10.11834/jig.221085]
Sugiarto B, Prakasa E, Wardoyo R, Damayanti R, Krisdianto N, Dewi L M, Pardede H F and Rianto Y. 2017. Wood identification based on histogram of oriented gradient (HOG) feature and support vector machine (SVM) classifier//Proceedings of the 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE). Yogyakarta, Indonesia: IEEE: 337-341 [DOI: 10.1109/ICITISEE.2017.8285523http://dx.doi.org/10.1109/ICITISEE.2017.8285523]
Šulc M and Matas J. 2017. Fine-grained recognition of plants from images. Plant Methods, 13: #115 [DOI: 10.1186/s13007-017-0265-4http://dx.doi.org/10.1186/s13007-017-0265-4]
Wang C C, He W, Nie Y, Guo J Y, Liu C J, Han K and Wang Y H. 2023. Gold-YOLO: efficient object detector via gather-and-distribute mechanism [EB/OL]. [2023-12-19]. https://arxiv.org/pdf/2309.11331.pdfhttps://arxiv.org/pdf/2309.11331.pdf
Wang J W, Xu C, Yang W and Yu L. 2022. A normalized Gaussian Wasserstein distance for tiny object detection [EB/OL]. [2023-12-19]. http://arxiv.org/pdf/2110.13389.pdfhttp://arxiv.org/pdf/2110.13389.pdf
Yang Z M and Song W. 2023. Selecting and fusing coarse-and-fine granularity features for fine-grained image recognition. Journal of Image and Graphics, 28(7): 2081-2092
阳治民, 宋威. 2023. 选择并融合粗细粒度特征的细粒度图像识别. 中国图象图形学报, 28(7): 2081-2092 [DOI: 10.11834/jig.220052http://dx.doi.org/10.11834/jig.220052]
Yilmaz A, Javed O and Shah M. 2006. Object tracking: a survey. ACM Computing Surveys (CSUR), 38(4): #13-es [DOI: 10.1145/1177352.1177355http://dx.doi.org/10.1145/1177352.1177355]
Yuan X, Cheng G, Li G, Dai W, Yin W X, Feng Y C, Yao X W, Huang Z L, Sun X and Han J W. 2023. Progress in small object detection for remote sensing images. Journal of Image and Graphics, 28(6): 1662-1684
袁翔, 程塨, 李戈, 戴威, 尹文昕, 冯瑛超, 姚西文, 黄钟泠, 孙显, 韩军伟. 2023. 遥感影像小目标检测研究进展. 中国图象图形学报, 28(6): 1662-1684 [DOI: 10.11834/jig.221202http://dx.doi.org/10.11834/jig.221202]
Zhang X S and Gao T. 2020. Multi-head attention model for aspect level sentiment analysis. Journal of Intelligent and Fuzzy Systems, 38(1): 89-96 [DOI: 10.3233/JIFS-179383http://dx.doi.org/10.3233/JIFS-179383]
Zhao L Q and Li S Y. 2020. Object detection algorithm based on improved YOLOv3. Electronics, 9(3): #537 [DOI: 10.3390/electronics9030537http://dx.doi.org/10.3390/electronics9030537]
Zheng Z H, Wang P, Liu W, Li J Z, Ye R G and Ren D W. 2020. Distance-IoU loss: faster and better learning for bounding box regression//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 12993-13000 [DOI: 10.1609/aaai.v34i07.6999http://dx.doi.org/10.1609/aaai.v34i07.6999]
Zhu X K, Lyu S C, Wang X and Zhao Q. 2021. TPH-YOLOv5: improved YOLOv5 based on Transformer prediction head for object detection on drone-captured scenarios//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision Workshops. Montreal, Canada: IEEE: 2778-2788 [DOI: 10.1109/iccvw54120.2021.00312http://dx.doi.org/10.1109/iccvw54120.2021.00312]
相关作者
相关机构