基于全局-局部上下文自适应加权融合的红外飞机检测算法
Infrared aircraft detection algorithm based on adaptive weighted fusion of global-local context
- 2024年 页码:1-14
网络出版日期: 2024-09-03
DOI: 10.11834/jig.240271
移动端阅览
浏览全部资源
扫码关注微信
网络出版日期: 2024-09-03 ,
移动端阅览
徐红鹏,刘刚,习江涛等.基于全局-局部上下文自适应加权融合的红外飞机检测算法[J].中国图象图形学报,
Xu Hongpeng,Liu Gang,Xi Jangtao,et al.Infrared aircraft detection algorithm based on adaptive weighted fusion of global-local context[J].Journal of Image and Graphics,
目的
2
针对远距离红外飞机目标检测中存在的由于成像面积小、辐射强度较弱造成无法充分提取目标特征进而影响检测性能的问题,提出一种基于全局-局部上下文自适应加权融合(Adaptive Weighted Fusion of Global-Local Context,AWFGLC)机制的红外飞机目标检测算法。
方法
2
基于全局-局部上下文自适应加权融合机制,沿着通道维度随机进行划分与重组,将输入特征图切分为两个特征图。一个特征图使用自注意力进行全局上下文建模,建立目标特征与背景特征之间的相关性,突出目标较显著的特征,使得检测算法更好地感知目标的全局特征。对另一特征图进行窗口划分并在每个窗口内进行最大池化和平均池化以突出目标局部特征,随后使用自注意力对池化特征图进行局部上下文建模,建立目标与其周围邻域的相关性,进一步增强目标特征较弱部分,使得检测算法更好地感知目标的局部特征。根据目标特点,利用可学习参数的自适应加权融合策略将全局上下文和局部上下文特征图进行聚合,得到包含较完整目标信息的特征图,增强检测算法对目标与背景的判别能力。
结果
2
将全局-局部上下文自适应加权融合机制引入YOLOv7并对红外飞机目标进行检测,实验结果表明,提出算法在自制和公开红外飞机数据集的mAP50分别达到97.8%、88.7%,mAP50:95分别达到65.7%、61.2%。
结论
2
本文所提出的红外飞机检测算法,优于经典的目标检测算法,能够有效实现红外飞机目标检测。
Objective
2
Aiming at the problem that the target features cannot be fully extracted due to the small imaging area and weak radiation intensity in long-range infrared aircraft target detection, we propose an infrared aircraft target detection algorithm based on Adaptive Weighted Fusion of Global-Local Context (AWFGLC). The global context tends to focus on the overall distribution of the target, which can provide the detection algorithm with global information of the target in the image with strong radiation intensity and clear contours. The local context tends to focus on the local details of the target and the background information around the target, and can provide the detection algorithm with local information of targets with weak radiation intensity and blurred contours. Therefore, the global context and local context should be combined according to the target characteristics in practical applications.
Method
2
Based on the global-local context adaptive weighted fusion mechanism, the input feature map is randomly divided and reorganized along the channel dimensions to cut the input feature map into two feature maps. Compared with global context modeling and local context modeling for input feature maps based on a specific arrangement or simply dividing the input feature map into two feature maps, during iterative training, the arrangement of the input feature map can be diversely changed by randomly reorganizing it with different random numbers in each training round, and global context modeling and local context modeling can help the detection algorithm learn more diversified and comprehensive features by combining feature maps with different arrangements. The global context modeling and local context modeling of different combinations of feature maps can help the detection algorithm learn more diverse and comprehensive features. Meanwhile, in order to reduce the complexity of contextual modeling of input feature maps, the complexity of contextual modeling is reduced by half by dividing the input feature maps equally in the channel dimension and performing global contextual modeling and local contextual modeling respectively. A feature map is subjected to global context modeling using self-attention to establish the dependencies between each pixel in the feature map and other pixels, to establish the correlation between the target features and the background features, to highlight the more salient features of the target, and to enable the detection algorithm to better perceive the global features of the target. The other feature map is divided into windows and maximum pooling and average pooling are performed within each window to highlight the local features of the target, and then the pooled feature map is subjected to local contextual modeling using self-attention to establish the correlation between the target and its surrounding neighborhoods, which further enhances the weaker parts of the target features and enables the detection algorithm to better perceive the local features of the target. According to the target characteristics, the global context and local context feature maps are adaptively weighted and channel spliced using an adaptive weighted fusion strategy of learnable parameters, so that the learnable parameters are updated along with the optimizer under the guidance of minimizing the loss function of the target detection algorithm. Subsequently, feature maps containing more complete target information are obtained using convolution, batch normalization and activation functions to enhance the detection algorithm's ability to discriminate between target and background. The mechanism of global-local context-adaptive weighted fusion is incorporated into the YOLOv7 feature extraction network, and the context modeling is performed on the downsampled 4-fold feature maps and the downsampled 32-fold feature maps to make full use of the physical information in the shallow feature maps and the semantic information in the deeper feature maps, in order to improve the model's capability of extracting infrared aircraft target features.
Result
2
The experimental results show that the proposed AWFGLC mechanism outperforms the contextual mechanisms such as Global Context, Position Attention, and Local Transformer in terms of detection accuracy under the condition of increasing the number of parameters and computation in the homemade infrared aircraft dataset. The proposed AWFGLC mechanism is more inclined to learn the global features of the target. Compared with the YOLOv7 detection algorithm, which has the second best performance, the proposed AWFGLC-YOLO algorithm improves the mAP50 and mAP50:95 by 1.4% and 4.4%, respectively. In the publicly available dataset of weak aircraft target detection and tracking in infrared images in ground/air context, the proposed AWFGLC mechanism is more inclined to learn the local features of the target. Compared to the YOLOv8 detection algorithm, which has the second best performance, the proposed AWFGLC-YOLO algorithm improves mAP50 and mAP50:95 by 3.0% and 4.0%, respectively.
Conclusion
2
The infrared aircraft detection algorithm proposed in this paper is superior to classical target detection algorithms and can effectively achieve infrared aircraft target detection.
红外飞机目标检测全局上下文局部上下文自适应加权
infrared aircrafttarget detectionglobal contextlocal contextadaptive weighted
Cheng X, Song C, Shi J G, Zhou L, Zhang Y F, Zheng Y H. A review of deep learning-based generic target detection research[J]. Journal of Electronics,2021,49(07):1428-1438
程旭,宋晨,史金钢,周琳,张毅锋,郑钰辉.基于深度学习的通用目标检测研究综述[J].电子学报,2021,49(07):1428-1438[DOI: 10.12263/DZXB.20200570].
Cao Y, Xu J, Lin S, Wei F, Hu H. Gcnet: Non-local networks meet squeeze-excitation networks and beyond//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. Seoul, Korea, Republic of. IEEE: 1971-1980. [DOI: 10.1109/ICCVW.2019.00246http://dx.doi.org/10.1109/ICCVW.2019.00246]
Chen Q, Wang Y, Yang T, Zhang X. You only look one-level feature//Proceedings of 2021 the IEEE/CVF Conference on Computer Vision and Pattern Recognition. United States. IEEE Computer Society: 13034-13043. [DOI: 10.1109 /CVPR46437.2021.01284http://dx.doi.org/10.1109/CVPR46437.2021.01284]
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q. Centernet: Keypoint triplets for object detection//Proceedings of 2019 the IEEE/CVF International Conference on Computer Vision. Seoul, Korea, Republic of. IEEE: 6568-6577. [DOI: 10.1109/ICCV.2019.00667http://dx.doi.org/10.1109/ICCV.2019.00667]
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H. Dual attention network for scene segmentation// 2019 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, United States. IEEE: 3141-3149. [DOI: 10.1109/CVPR.2019.00326http://dx.doi.org/10.1109/CVPR.2019.00326]
Ge Z, Liu S, Wang F, Li Z, Sun J. Yolox: Exceeding yolo series in 2021 [J]. arXiv preprint :2107.08430, 2021.
Hu S, Zhao F, Lu H, Deng Y, Du J, Shen X. Improving YOLOv7-Tiny for Infrared and Visible Light Image Object Detection on Drones[J]. Remote Sensing, 2023, 15(13): 3214. [DOI: 10.3390/rs15133214http://dx.doi.org/10.3390/rs15133214]
Hou Z, Sun Y, Guo H, Li J, Ma S, Fan J. M-YOLO: an object detector based on global context information for infrared images[J]. Journal of Real-Time Image Processing, 2022, 19(6): 1009-1022. [DOI: 10.1007/s11554-022-01242-yhttp://dx.doi.org/10.1007/s11554-022-01242-y]
Hui B W, Song Z Y, Fan H Q, Zhong P, Hu W D, Zhang X F, Lin J G, Su H Y, Jin W, Zhang Y J, Bai Y X. A dataset for weak aircraft target detection and tracking in infrared imagery in ground/air context[J]. Chinese Scientific Data (Chinese-English Web Edition),2020,5(03):291-302
回丙伟,宋志勇,范红旗,钟平,胡卫东,张晓峰,凌建国,苏宏艳,金威,张永杰,白亚茜. 地/空背景下红外图像弱小飞机目标检测跟踪数据集[J].中国科学数据(中英文网络版),2020,5(03):291-302.
Jamali A, Roy S K, Bhattacharya A, Pedram G. Local window attention transformer for polarimetric SAR image classification[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 1-5.[DOI: 10.1109/LGRS.2023.3239263http://dx.doi.org/10.1109/LGRS.2023.3239263]
Liu, Z, Lin Y T, Cao Y, Wei Y X, Zhang Z, Lin S, Guo B. Swin transformer: Hierarchical vision transformer using shifted windows//Proceedings of 2021 the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Canada. IEEE: 9992-10002. [DOI: 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986]
Muhammad M B, Yeasin M. Eigen-cam: Class activation map using principal components// Proceedings of 2020 International Joint Conference on Neural Networks. Virtual, Glasgow, United kingdom. IEEE. [DOI: 10.1109/IJCNN48605.2020.9206626http://dx.doi.org/10.1109/IJCNN48605.2020.9206626]
Redmon, J, Ali F. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767 (2018).
Reis D, Kupec J, Hong J, Ahmad D,. Real-Time Flying Object Detection with YOLOv8[J]. arXiv:2305.09972, 2023. [DOI: 10.48550/arXiv.2305.09972http://dx.doi.org/10.48550/arXiv.2305.09972]
Sun P, Zhang R, Jiang Y, Tao K, Chen F X, Wei Z, Masayoshi T, Lei Li, Ze H Y, Wang C H, Luo P. Sparse r-cnn: End-to-end object detection with learnable proposals//Proceedings of 2021 the IEEE/CVF Conference on Computer Vision and Pattern Recognition. United States. IEEE: 14454-14463. [DOI: 10.1109/CVPR46437.2021.01422http://dx.doi.org/10.1109/CVPR46437.2021.01422]
Tan J H, Yin W, Liu L M, Wang Y N. Introduction of global contextual feature module for target tracking in DenseNet twin networks[J]. Journal of Electronics and Information, 2021,43(01):179-186
谭建豪,殷旺,刘力铭,王耀南. 引入全局上下文特征模块的DenseNet孪生网络目标跟踪[J].电子与信息学报,2021,43(01):179-186[DOI: 10.11999/JEIT190788].
Tan M, Pang R, Le Q V. Efficientdet: Scalable and efficient object detection//Proceedings of 2020 the IEEE/CVF Conference on Computer Vision and Pattern Recognition.United States. IEEE: 10778-10787. [DOI: 10.1109/CVPR42600.2020.01079http://dx.doi.org/10.1109/CVPR42600.2020.01079]
Wang, Chien Y, Alexey B, Liao H Y,. Scaled-yolov4: Scaling cross stage partial network//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. United States. IEEE: 13024-13033. [DOI: 10.1109/ CVPR46437.2021.01283http://dx.doi.org/10.1109/CVPR46437.2021.01283]
Wang C Y, Bochkovskit A, Liao H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors//Proceedings of 2023 the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, CANADA. IEEE:7464-7475.[DOI10.1109/CVPR52729. 2023. 00721]
Wang C Y, Yeh I H, Liao H Y M. You Only Learn One Representation: Unified Network for Multiple Tasks[J]. Journal of Information Science and Engineering, 2023, 39(3): 691-709. [DOI: 10.1145/3643832.3661447http://dx.doi.org/10.1145/3643832.3661447]
Wang K, Li S, Niu S,Zhang K. Detection of infrared small targets using feature fusion convolutional network[J]. IEEE Access, 2019, 7: 146081-146092.[DOI: 10.1109/ ACCESS. 2019. 2944661http://dx.doi.org/10.1109/ACCESS.2019.2944661]
Wang Z X, XU J M, Xue Y Y, Lang C Y, Li Z, Wei L L . Vehicle re-identification by fusing global and spatial multi-scale contextual information[J]. Chinese Journal of Image Graphics,2023,28(02):471-482.
王振学,许喆铭,雪洋洋,郎丛妍,李尊,魏莉莉. 融合全局与空间多尺度上下文信息的车辆重识别[J].中国图象图形学报,2023,28(02):471-482.[DOI: doi:10.11834/jig.210849http://dx.doi.org/doi:10.11834/jig.210849]
Xiao F, Lu H, Zhang W J, Huang Z J, Jiao Y L, Lu Z Y, Li Z S. Target recognition algorithm for aerial infrared images based on rotational isovariant convolution[J/OL]. Journal of Military Engineering: 1-9[2024-02-06]
肖锋,卢浩,张文娟,黄姝娟,焦雨林,卢昭廷,李照山. 基于旋转等变卷积的航拍红外图像目标识别算法[J/OL].兵工学报:1-9[2024-02-06].http: //kns.cnki.net/kcms/detail/ 11.2176. TJ.20231018.1031.004. htmlhttp://kns.cnki.net/kcms/detail/11.2176.TJ.20231018.1031.004.html.[DOI: 10.12382/bgxb.2023.0503http://dx.doi.org/10.12382/bgxb.2023.0503]
Xue S, An H Y, LU Q Y, Cao G H. Image target detection algorithm based on YOLOv7-tiny in complex background[J]. Infrared and Laser Engineering,2024,53(01):269-28
薛珊,安宏宇,吕琼莹曹国华.复杂背景下基于YOLOv7-tiny的图像目标检测算法[J].红外与激光工程,2024,53(01):269-280.[DOI:10.3788/ IRLA20230472http://dx.doi.org/10.3788/IRLA20230472.]
Yang Z, Xia X, Liu Y, Wen G W, Zhang W E, Guo L. LPST-Det: Local-Perception-Enhanced Swin Transformer for SAR Ship Detection[J]. Remote Sensing, 2024, 16(3): 483.[DOI: 10.3390/rs16030483http://dx.doi.org/10.3390/rs16030483]
Yang P B, Sang J D, Zhang B, Feng Y G, Yu J. A review of deep model interpretability studies for image classification[J]. Journal of Software, 2023, 34(01): 230 -254.
杨朋波,桑基韬,张彪,冯耀功,于剑. 面向图像分类的深度模型可解释性研究综述[J].软件学报,2023,34(01): 230 -254[DOI:10.13328/j.cnki.jos.006415http://dx.doi.org/10.13328/j.cnki.jos.006415.
Zhou X, Jiang L, Hu C, Lei S, Zhang T, Mou X A. YOLO-SASE: an improved YOLO algorithm for the small targets detection in complex backgrounds[J]. Sensors, 2022, 22(12): 4600. [DOI: 10.3390/ s22124600http://dx.doi.org/10.3390/s22124600]
Zhou W N, Wu Z H, Zhang Z D, Peng L, Xie L B. Research on lightweight small target detection method based on weak feature enhancement[J]. Control and Decision Making: 2024,39(02):381-390
周葳楠,吴治海,张正道,彭力,谢林柏.基于弱特征增强的轻量化小目标检测方法[J].控制与决策,2024,39(02): 381-390[DOI:10.13195/ j.kzyjc.2022.1432].
Zhao X, Xia Y, Zhang W, Zheng C, Zhang Z. YOLO-ViT-Based Method for Unmanned Aerial Vehicle Infrared Vehicle Target Detection[J]. Remote Sensing, 2023, 15(15): 3778. [DOI: 10.3390/ rs15153778http://dx.doi.org/10.3390/rs15153778]
Zhang G H, Li G F, Li G Y, Lu W D. Small target detection algorithm for UAV aerial images based on improved YOLOv7-tiny[J/OL].
Engineering Science and Technology(张光华,李聪发,李钢硬,卢为党.基于改进YOLOv7-tiny的无人机航拍图像小目标检测算法[J/OL].工程科学与技术:1-14[2024-01-15]). https://doi.org/10.15961/j.jsuese. 202300593https://doi.org/10.15961/j.jsuese.202300593.
Zhang W, Fu C, Xie H Y, Zhu M, Tie M, Chen J X. Global context aware RCNN for object detection[J]. Neural Computing & Applications, 2021, 33(18): 11627-11639. [DOI: 10.1007/s00521-021-05867-1http://dx.doi.org/10.1007/s00521-021-05867-1]
Zhu B, Wang J, Jiang Z, Zong F H, Liu S T, Li Z M, Sun J. Autoassign: Differentiable label assignment for dense object detection [J]. arXiv preprint: 2007.03496, 2020.
Zhu X, Su W J, Lu L W, Li Bin, Wang X G, Dai J F. Deformable DETR: Deformable Transformers for End-to-End Object Detection//Proceedings of 2020 International Conference on Learning Representations.
Zhang S, Wang X, Wang J, Pang J, M, Lyu C Q, Zhang W W, Luo Ping, Chen K. Dense distinct query for end-to-end object detection[C]. Proceedings of 2023 the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada. IEEE: 7329-7338.[DOI: 10.1109/ CVPR52729. 2023.00708http://dx.doi.org/10.1109/CVPR52729.2023.00708]
相关作者
相关机构