基于高空无人机平台的多模态跟踪数据集
A benchmark dataset for high-altitude UAV multi-modal tracking
- 2025年30卷第2期 页码:361-374
纸质出版日期: 2025-02-16
DOI: 10.11834/jig.240040
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2025-02-16 ,
移动端阅览
肖云, 曹丹, 李成龙, 江波, 汤进. 2025. 基于高空无人机平台的多模态跟踪数据集. 中国图象图形学报, 30(02):0361-0374
Xiao Yun, Cao Dan, Li Chenglong, Jiang Bo, Tang Jin. 2025. A benchmark dataset for high-altitude UAV multi-modal tracking. Journal of Image and Graphics, 30(02):0361-0374
目的
2
无人机(unmanned aerial vehicle,UAV)因易操纵、灵活等特点,近年来在军事和民用等多个领域得到广泛应用。相对于低空无人机,高空无人机具有更广的视野,更强的隐蔽性,在情报侦察、灾害救援等方面具有更高的应用价值。然而,现有无人机多模态目标跟踪研究主要针对低空无人机,缺乏高空无人机多模态目标跟踪数据集,限制了该领域的研究和发展。
方法
2
构建了一个用于评估高空无人机多模态目标跟踪方法的数据集HiAl(high altitude UAV multi-modal tracking dataset),该数据集主要由搭载混合传感器的无人机在500 m高空拍摄的可见光—红外多模态视频构成,两种模态数据经过精确配准和帧级标注,可以较好地评估不同多模态目标跟踪方法在高空无人机平台下的性能表现。
结果
2
将主流的12种多模态跟踪方法在所提数据集与非高空无人机场景数据集上的表现进行了比较,方法TBSI(template-bridged search region interaction)在RGBT234数据集(RGB-thermal dataset)上PR(precision rate)值达到0.871,而在本文所提数据集上仅0.527,下降了39.5%,其SR(success rate)值由RGBT234数据集上的0.637,下降到本文所提数据集上的0.468,下降了26.5%。方法HMFT(hierarchical multi-modal fusion tracker)在所提数据集上的PR与RGBT234相比下降了23.6%,SR下降了14%。此外,利用HiAl数据集对6个方法进行重新训练实验,所有重训练方法的性能均得到提升。
结论
2
本文提出一个基于高空无人机平台的多模态目标跟踪数据集,旨在促进多模态目标跟踪在高空无人机上的应用研究。HiAl数据集的在线发布地址为:
https://github.com/mmic-lcl/Datasets-and-benchmark-code/tree/main
https://github.com/mmic-lcl/Datasets-and-benchmark-code/tree/main
。
Objective
2
Unmanned aerial vehicles (UAVs) have become crucial tools in both modern military and civilian contexts owing to their flexibility and ease of operation. High-altitude UAVs provide unique and distinct advantages over low-altitude UAVs, such as wider fields of view and stronger concealment, making them highly valuable in intelligence reconnaissance, emergency rescue, and disaster relief tasks. However, tracking objects with high-altitude UAVs introduces considerable challenges, including UAV rotation, tiny objects, complex background changes, and low object resolution. The current research on multi-modal object tracking of UAVs focuses primarily on low-altitude UAVs, such as the dataset named VTUAV (visible-thermal UAV) for multi-modal object tracking of UAVs, which is shot in low-altitude airspace of 5–20 m and can fully show the unique perspective of UAVs. However, the scenes captured by high-altitude UAVs significantly differ from those captured by low-altitude UAVs. Thus, this dataset cannot provide strong support for the development of high-altitude UAV multi-modal object tracking, which is also the bottleneck of the lack of data support in the research field of multi-modal object tracking of high-altitude UAVs. Given the lack of an evaluation dataset to evaluate the multi-modal object tracking method of high-altitude UAVs, this limitation hinders research and development in this field.
Method
2
This study proposes an evaluation dataset named HiAl specifically for multi-modal object tracking methods of high-altitude UAVs captured at approximately 500 m. The UAV shooting this dataset is equipped with a hybrid sensor, which can capture video in both visible and infrared modes. The collected multimodal videos with high-quality videos were registered to provide a higher level of ground truth annotation and evaluate different multi-modal object tracking methods more fairly. First, the two video modalities were manually aligned to ensure that the same tracking object in each pair of videos occupied the same position within the frame. During the registration process, ensuring accurate registration of the area where the tracking object is located is the top priority, and under this premise, other areas in the image also become roughly aligned. Then, accurate ground truth annotations are provided to each frame of the video on the basis of the high alignment of the two modalities. The horizontal annotation boxes were used to label the position of the target in a way that best fits the contour of the tracked object. In the abovementioned modal alignment, two video modalities can share the same ground truth, which allows better evaluation of different multi-modal object tracking methods under the high-altitude UAV platform. Tracking attributes, scenes, and object categories was comprehensively considered during the data collection process to ensure the diversity and authenticity of the dataset. The dataset considers different lighting conditions and weather factors, including night and foggy days, for nine common object categories in high-altitude UAV scenes. The dataset has 12 tracking attributes; two are unique to UAVs, which have rich practical significance and high challenges. In contrast to existing multimodal tracking datasets, this dataset tracks mostly small targets, which is also a realistic challenge associated with high-altitude UAV shooting.
Result
2
The performances of 10 mainstream multi-modal tracking methods on this dataset are compared with those on a nonhigh-altitude UAV scene dataset. This study employs common quantitative evaluation metrics, namely, the precision rate (PR) and success rate (SR), to assess the performance of each method. Taking the two outstanding methods as examples, the PR value of the template-bridged search region interaction (TBSI) method on the RGB-thermal dataset (RGBT234) reached 0.871, whereas it was only 0.527 on the dataset proposed in this study, which decreased by 39.5%; its SR value decreased from 0.637 on RGBT234 to 0.468 on the dataset proposed in this study, which decreased by 26.5%. Compared with those of RGBT234, the PR and SR of the hierarchical multi-modal fusion tracker (HMFT) on the HiAl dataset decreased by 23.6% and 14%, respectively. In addition, the dataset HiAlto was used to retrain six methods. Comparative results indicated improved performance of all the retraining methods. For example, the PR value of the duality-gated mutual condition network (DMCNet) is increased from 0.485 before training to 0.524, and the SR value increased from 0.512 before training to 0.526. These experimental results reflect the high challenge and necessity of the dataset.
Conclusion
2
An evaluation dataset designed to assess the performance of multi-modal object tracking methods for high-altitude UAVs is introduced in this study. The multimodal data collected in the real scene and provided frame-level ground truth annotations were carefully registered to provide a dedicated dataset for high-quality multi-modal tracking of high-altitude UAVs. This proposed dataset HiAl can serve as a’standard evaluation tool for future research, offering researchers access to authentic and varied data to evaluate their algorithms performance. The experimental results of 10 mainstream tracking algorithms in HiAl with other datasets were compared, and the experimental results of retraining 6 tracking algorithms, including the limitations of existing algorithms in the multi-modal object tracking task of high-altitude UAVs, were analyzed. The potential research directions were extracted for researchers’ reference. The HiAl dataset is available at
https://github.com/mmic-lcl/Datasets-and-benchmark-code/tree/main
https://github.com/mmic-lcl/Datasets-and-benchmark-code/tree/main
.
Bhat G , Danelljan M , Van Gool L and Timofte R . 2019 . Learning discriminative model prediction for tracking // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul, Korea (South) : IEEE: 6181 - 6190 [ DOI: 10.1109/ICCV.2019.00628 http://dx.doi.org/10.1109/ICCV.2019.00628 ]
Du D W , Qi Y K , Yu H Y , Yang Y F , Duan K W , Li G R , Zhang W G , Huang Q M and Tian Q . 2018 . The unmanned aerial vehicle benchmark: object detection and tracking // Proceedings of the 15th European Conference on Computer Vision . Munich, Germany : Springer: 375 - 391 [ DOI: 10.1007/978-3-030-01249-6_23 http://dx.doi.org/10.1007/978-3-030-01249-6_23 ]
Gao Z N , Li D D , Wen G J , Kuai Y L and Chen R . 2023 . Drone based RGBT tracking with dual-feature aggregation network . Drones , 7 ( 9 ): # 585 [ DOI: 10.3390/drones7090585 http://dx.doi.org/10.3390/drones7090585 ]
Hui T R , Xun Z Z , Peng F G , Huang J S , Wei X M , Wei X L , Dai J , Han J Z and Liu S . 2023 . Bridging search region interaction with template for RGB-T tracking // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 13630 - 13639 [ DOI: 10.1109/cvpr52729.2023.01310 http://dx.doi.org/10.1109/cvpr52729.2023.01310 ]
Jiang N , Wang K R , Peng X K , Yu X H , Wang Q , Xing J L , Li G R , Zhao J , Guo G D and Han Z J . 2021 . Anti-UAV: a large multi-modal benchmark for UAV tracking [EB/OL]. [ 2024-01-24 ]. https://arxiv.org/pdf/2101.08466v2.pdf https://arxiv.org/pdf/2101.08466v2.pdf
Lan X Y , Ye M , Zhang S P , Zhou H Y and Yuen P C . 2020 . Modality-correlation-aware sparse representation for RGB-infrared object tracking . Pattern Recognition Letters , 130 : 12 - 20 [ DOI: 10.1016/J.PATREC.2018.10.002 http://dx.doi.org/10.1016/J.PATREC.2018.10.002 ]
Li B W , Fu C H , Ding F Q , Ye J J and Lin F L . 2023 . All-day object tracking for unmanned aerial vehicle . IEEE Transactions on Mobile Computing , 22 ( 8 ): 4515 - 4529 [ DOI: 10.1109/TMC.2022.3162892 http://dx.doi.org/10.1109/TMC.2022.3162892 ]
Li C L , Cheng H , Hu S Y , Liu X B , Tang J and Lin L . 2016 . Learning collaborative sparse representation for grayscale-thermal tracking . IEEE Transactions on Image Processing , 25 ( 12 ): 5743 - 5756 [ DOI: 10.1109/TIP.2016.2614135 http://dx.doi.org/10.1109/TIP.2016.2614135 ]
Li C L , Liang X Y , Lu Y J , Zhao N and Tang J . 2019a . RGB-T object tracking: benchmark and baseline . Pattern Recognition , 96 : # 106977 [ DOI: 10.1016/J.PATCOG.2019.106977 http://dx.doi.org/10.1016/J.PATCOG.2019.106977 ]
Li C L , Liu L , Lu A D , Ji Q and Tang J . 2020 . Challenge-aware RGBT tracking // Proceedings of the 16th European Conference on Computer Vision . Glasgow, UK : Springer: 222 - 237 [ DOI: 10.1007/978-3-030-58542-6_14 http://dx.doi.org/10.1007/978-3-030-58542-6_14 ]
Li C L , Lu A D , Liu L and Tang J . 2023 . Multi-modal visual tracking: a survey . Journal of Image and Graphics , 28 ( 1 ): 37 - 56
李成龙 , 鹿安东 , 刘磊 , 汤进 . 2023 . 多模态视觉跟踪方法综述 . 中国图象图形学报 , 28 ( 1 ): 37 - 56 [ DOI: 10.11834/jig.220578 http://dx.doi.org/10.11834/jig.220578 ]
Li C L , Lu A D , Zheng A H , Tu Z Z and Tang J . 2019b . Multi-adapter RGBT tracking // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop . Seoul, Korea (South) : IEEE: 2262 - 2270 [ DOI: 10.1109/ICCVW.2019.00279 http://dx.doi.org/10.1109/ICCVW.2019.00279 ]
Li S Y and Yeung D Y . 2017 . Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models // Proceedings of the 31st AAAI Conference on Artificial Intelligence . San Francisco, USA : AAAI Press: 4140 - 4146 [ DOI: 10.1609/AAAI.V31I1.11205 http://dx.doi.org/10.1609/AAAI.V31I1.11205 ]
Lin T Y , Maire M , Belongie S , Hays J , Perona P , Ramanan D , Dollr P and Zitnick C L . 2014 . Microsoft COCO: common objects in context // Proceedings of the 13th European Conference on Computer Vision . Zurich, Switzerland : Springer: 740 - 755 [ DOI: 10.1007/978-3-319-10602-1_48 http://dx.doi.org/10.1007/978-3-319-10602-1_48 ]
Lu A D , Li C L , Yan Y Q , Tang J and Luo B . 2021 . RGBT tracking via multi-adapter network with hierarchical divergence loss . IEEE Transactions on Image Processing , 30 : 5613 - 5625 [ DOI: 10.1109/TIP.2021.3087341 http://dx.doi.org/10.1109/TIP.2021.3087341 ]
Lu A D , Qian C , Li C L , Tang J and Wang L . 2022 . Duality-gated mutual condition network for RGBT tracking . IEEE Transactions on Neural Networks and Learning Systems , 1 - 14 [ DOI: 10.1109/TNNLS.2022.3157594 http://dx.doi.org/10.1109/TNNLS.2022.3157594 ]
Mueller M , Smith N and Ghanem B . 2016 . A benchmark and simulator for UAV tracking // Proceedings of 14th European Conference on Computer Vision . Amsterdam, the Netherlands : Springer: 445 - 461 [ DOI: 10.1007/978-3-319-46448-0_27 http://dx.doi.org/10.1007/978-3-319-46448-0_27 ]
Tu Z Z , Lin C , Zhao W , Li C L and Tang J . 2021 . M 5 L: multi-modal multi-margin metric learning for RGBT tracking . IEEE Transactions on Image Processing , 31 : 85 - 98 [ DOI: 10.1109/TIP.2021.3125504 http://dx.doi.org/10.1109/TIP.2021.3125504 ]
Wang H Y , Liu X T , Li Y F , Sun M , Yuan D and Liu J . 2024 . Temporal adaptive RGBT tracking with modality prompt // Proceedings of the 38th AAAI Conference on Artificial Intelligence . Washington, USA : AAAI: 5436 - 5444
Xiao Y , Yang M M , Li C L , Liu L and Tang J . 2022 . Attribute-based progressive fusion network for RGBT tracking // Proceedings of the 36th AAAI Conference on Artificial Intelligence . Menlo Park, USA : AAAI Press: 2831 - 2838 [ DOI: 10.1609/AAAI.V36i3.20187 http://dx.doi.org/10.1609/AAAI.V36i3.20187 ]
Yang J Y , Gao S , Li Z , Zheng F and Leonardis A . 2023 . Resource-efficient RGBD aerial tracking // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 13374 - 13383 [ DOI: 10.1109/CVPR52729.2023.01285 http://dx.doi.org/10.1109/CVPR52729.2023.01285 ]
Ye J , Fu C H , Zheng G Z , Cao Z and Li B . 2021 . Darklighter: light up the darkness for uav tracking. IEEE/RSJ International Conference on Intelligent Robots and Systems . Prague, Czech Republic : IEEE : 3079 - 3085 [ DOI: 10.1109/IROS51168.2021.9636680 http://dx.doi.org/10.1109/IROS51168.2021.9636680 ]
Yun X , Sun Y J , Yang X X and Lu N N . 2019 . Discriminative fusion correlation learning for visible and infrared tracking . Mathematical Problems in Engineering , 2019 : # 2437521 [ DOI: 10.1155/2019/2437521 http://dx.doi.org/10.1155/2019/2437521 ]
Zhai S L , Shao P P , Liang X Y and Wang X . 2019 . Fast RGB-T tracking via cross-modal correlation filters . Neurocomputing , 334 : 172 - 181 [ DOI: 10.1016/J.NEUCOM.2019.01.022 http://dx.doi.org/10.1016/J.NEUCOM.2019.01.022 ]
Zhang C H , Huang G J , Liu L , Huang S , Yang Y N , Wan X , Ge S M and Tao D C . 2022a . WebUAV-3M: a benchmark for unveiling the power of million-scale deep UAV tracking . IEEE Transactions on Pattern Analysis and Machine Intelligence , 45 ( 7 ): 9186 - 9205 [ DOI: 10.1109/TPAMI.2022.3232854 http://dx.doi.org/10.1109/TPAMI.2022.3232854 ]
Zhang H , Zhang L , Zhuo L and Zhang J . 2020 . Object tracking in RGB-T videos using modal-aware attention network and competitive learning . Sensors , 20 ( 2 ): # 393 [ DOI: 10.3390/s20020393 http://dx.doi.org/10.3390/s20020393 ]
Zhang P Y , Wang D , Lu H C and Yang X Y . 2021 . Learning adaptive attribute-driven representation for real-time RGB-T tracking . International Journal of Computer Vision , 129 ( 9 ): 2714 - 2729 [ DOI: 10.1007/s11263-021-01495-3 http://dx.doi.org/10.1007/s11263-021-01495-3 ]
Zhang P Y , Zhao J , Wang D , Lu H C and Ruan X . 2022b . Visible-thermal UAV tracking: a large-scale benchmark and new baseline // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 8876 - 8885 [ DOI: 10.1109/CVPR52688.2022.00868 http://dx.doi.org/10.1109/CVPR52688.2022.00868 ]
Zhu J W , Lai S M , Chen X , Wang D and Lu H C . 2023 . Visual prompt multi-modal tracking // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 9516 - 9526 [ DOI: 10.1109/CVPR52729.2023.00918 http://dx.doi.org/10.1109/CVPR52729.2023.00918 ]
Zhu P F , Wen L Y , Du D W , Bian X , Fan H , Hu Q H and Ling H B . 2021 . Detection and tracking meet drones challenge . IEEE Transactions on Pattern Analysis and Machine Intelligence , 44 ( 11 ): 7380 - 7399 [ DOI: 10.1109/TPAMI.2021.3119563 http://dx.doi.org/10.1109/TPAMI.2021.3119563 ]
Zhu Y B , Li C L , Tang J and Luo B . 2020 . Quality-aware feature aggregation network for robust RGBT tracking . IEEE Transactions on Intelligent Vehicles , 6 ( 1 ): 121 - 130 [ DOI: 10.1109/TIV.2020.2980735 http://dx.doi.org/10.1109/TIV.2020.2980735 ]
Zhuo L , Zhang S Y , Zhang H and Li J F . 2021 . Survey on techniques of single object tracking in unmanned aerial vehicle imagery . Journal of Beijing University of Technology , 47 ( 10 ): 1174 - 1187
卓力 , 张时雨 , 张辉 , 李嘉锋 . 2021 . 无人机影像单目标跟踪综述 . 北京工业大学学报 , 47 ( 10 ): 1174 - 1187 [ DOI: 10.11936/bjutxb2020030017 http://dx.doi.org/10.11936/bjutxb2020030017 ]
相关作者
相关机构