多模态视觉跟踪方法综述
Multi-modal visual tracking: a survey
- 2023年28卷第1期 页码:37-56
收稿日期:2022-06-02,
修回日期:2022-07-08,
录用日期:2022-7-15,
纸质出版日期:2023-01-16
DOI: 10.11834/jig.220578
移动端阅览
浏览全部资源
扫码关注微信
收稿日期:2022-06-02,
修回日期:2022-07-08,
录用日期:2022-7-15,
纸质出版日期:2023-01-16
移动端阅览
目标跟踪是计算机视觉研究中的前沿和热点问题,在安全监控、无人驾驶等领域中有着重要的应用价值。然而,目前基于可见光数据的视觉跟踪方法,在光照变化、恶劣天气下因数据质量受限难以实现鲁棒跟踪。因此,一些研究者提出了多模态视觉跟踪任务,通过引入其他模态数据,包括红外模态、深度模态、事件模态以及文本模态,在一定程度上弥补了可见光模态在恶劣天气、遮挡、快速运动和外观歧义等条件下的不足。多模态视觉跟踪旨在挖掘可见光和其他模态数据的互补优势,在视频中实现鲁棒的目标定位,对全天时全天候感知有着重要的价值和意义,受到越来越多的研究和关注。由于主流的多模态视觉跟踪方法针对可见光—红外跟踪展开,因此,本文以阐述可见光—红外跟踪方法为主,从信息融合的角度将现有方法划分为结合式融合和判别式融合,分别进行了详细介绍和分析,并对不同类方法的优缺点进行了分析和比较。然后,本文对其他多模态视觉跟踪任务的研究工作进行了介绍,并对不同多模态视觉跟踪任务的优缺点进行了分析和比较。最后,本文对多模态视觉跟踪方法进行了总结并对未来发展进行展望。
Visual tracking can be as one of the key tasks in computer vision applications like surveillance
robotics and automatic driving in the past decades. The performance issue for visual tracking is still challenged of the quality of visible light data in adverse scenes
such as low illumination
background clutter
haze and smog. To deal with the imaging constraints of visible light data
current researches are focused on multiple modal data-introduced in common. The visible and modal data integration can be effective in tracking performance in terms of the manner of thermal infrared
depth
event and language. Benefiting from the integrated capability of visible and multi-modal data
multi-modal trackers have been developing intensively in such complicated scenarios of those are low illumination
occlusion
fast motion and semantic ambiguity. Nowadays
our executive summary is focused on reviewing the RGB and thermal infrared(RGBT) tracking algorithms
which is oriented for the popular visible-infrared visual tracking towards multi-modal visual tracking. Existing multi-modal visual tracking-based summaries are concerned of the segmentation of tracking algorithms in terms of multi-framework tracking or multi-level based fusions derived of pixel
feature
and decision. With respect of the information fusion plays a key role in multi-modal visual tracking
we divide and analyze existing RGBT tracking methods from the perspective of information fusion
including synthesized and specific-based fusions. Specifically
the fusion-integrated can be used to combine all multimodal information together via different fusion methods
including: 1) sparse representation fusion
2) collaborative graph representation fusion
3) modality-synthesized and modality-specific information fusion
and 4) attribute-based feature decoupling fusion. First
sparse representation fusion has a good ability to suppress feature noise
but most of these algorithms are restricted by the time-consuming online optimization of the sparse representation models. In addition
these methods can be used as target representation via pixel values
and thus have low robustness in complex scenes. Second
collaborative graph representation fusion can be used to suppress the effect of background clutter in terms of modality weights and local image patch weights. However
these methods are required for multi-variables optimization iteratively
and the tracking efficiency is quite lower. Furthermore
these models are required to use color and gradient features
which are better than pixel values but also hard to deal with challenging scenarios. Third
modality-synthesized and modality-specific information fusion can use be used to model modality-synthesized and modality-specific representations based on different sub-networks and provide an effective fusion strategy for tracking. However
these methods are lack of the information interaction in the learning of modality-specific representations
and thus introduce noises and redundancy easily. Fourth
attribute-based feature decoupling fusion can be applied to model the target representations under different attributes
and it alleviates the dependence on large-scale training data more. However
it is difficult to cover all challenging problems in practical applications. Although these fusion-synthesized methods have achieved good tracking performance
all multiple modalities information-synthesized have to introduce the information redundancy and feature noises inevitably. To resolve these problems
some researches have been concerned of fusion-specific methods in RGBT tracking. This sort of fusion is aimed to mine the specific features of multiple modalities for effective and efficient information fusion
including: 1) feature-selected fusion
2) attention-mechanism-based adaptive fusion
3)mutual-enhanced fusion
and 4)other fusion-specific methods. Feature selection fusion is employed to select specific features in regular. It can not only avoid the interference of data noises and is beneficial to improving tracking performance
but also eliminate data redundancy. However
the selection criteria are hard to be designed
and unsuitable criterion often removes useful information under low-quality data and thus limits tracking performance. Adaptive fusion is aimed to estimate the reliability of multi-modal data in term of attention mechanism
including the modality
spatial and channel reliabilities
and thus achieves the adaptive fusion of multi-modal information. However
there is no clear supervision to generate these weights-reliable
which is possible to mislead the estimated results in complex scenarios. Mutual enhancement fusion is focused on data noises-related suppression for low-quality modality and its features can be enhanced between the specific information and other modality. These methods can be implemented to mine the specific information of multiple modalities and improve target representations of low-quality modalities. However
these methods are complicated and have low tracking efficiency.The task of multi-modal vision tracking has three sub-tasks besides RGBT tracking
including: 1) visible-depth tracking (called RGB and depth(RGBD) tracking)
2) visible-event tracking (called RGBE(RGB and event) tracking)
3) visible-language tracking (called RGB and language(RGBL) tracking). We review these three multi-modal visual tracking issues in brief as well. Furthermore
we predict some academic challenges and future directions for multi-modal visual tracking.
Bertinetto L, Valmadre J, Henriques J F, Vedaldi A and Torr P H S. 2016. Fully-convolutional siamese networks for object tracking//Proceedings of the European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 850-865[ DOI: 10.1007/978-3-319-48881-3_56 http://dx.doi.org/10.1007/978-3-319-48881-3_56 ]
Bhat G, Danelljan M, Gool L V and Timofte R. 2019. Learning discriminative model prediction for tracking//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 6181-6190[ DOI: 10.1109/ICCV.2019.00628 http://dx.doi.org/10.1109/ICCV.2019.00628 ]
Bhat G, Lawin F J, Danelljan M, Robinson A, Felsberg M, Gool L V and Timofte R. 2020. Learning What to Learn for Video Object Segmentation//Proceedings of 2020 European Conference on Computer Vision. Springer: 777-794[ DOI: 10.1007/978-3-030-58536-5_46 http://dx.doi.org/10.1007/978-3-030-58536-5_46 ]
Bibi A, Zhang T Z and Ghanem B. 2016. 3D part-based sparse tracker with automatic synchronization and registration//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1439-1448[ DOI: 10.1109/CVPR.2016.160 http://dx.doi.org/10.1109/CVPR.2016.160 ]
Chan A L and Schnelle S R. 2012. Target tracking using concurrent visible and infrared imageries//Proceedings Volume 8392, Signal Processing, Sensor Fusion, and Target Recognition XXI. Baltimore, United States: SPIE: #918373[ DOI: 10.1117/12.918373 http://dx.doi.org/10.1117/12.918373 ]
Chan A L and Schnelle S R. 2013. Fusing concurrent visible and infrared videos for improved tracking performance. Optical Engineering, 52(1): #017004[DOI: 10.1117/1.OE.52.1.017004]
Chen X, Yan B, Zhu J W, Wang D, Yang X Y and Lu H C. 2021. Transformer tracking//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 8122-8131[ DOI: 10.1109/CVPR46437.2021.00803 http://dx.doi.org/10.1109/CVPR46437.2021.00803 ]
Chen Y, Shen Y J, Liu X and Zhong B N. 2015. 3D object tracking via image sets and depth-based occlusion detection. Signal Processing, 112: 146-153[DOI: 10.1016/j.sigpro.2014.08.046]
Chen Z L and Shi F H. 2022. Double template fusion based siamese network for robust visual object tracking. Journal of Image and Graphics, 27(4): 1191-1203
陈志良, 石繁槐. 2022. 结合双模板融合与孪生网络的鲁棒视觉目标跟踪. 中国图象图形学报, 27(4): 1191-1203[DOI: 10.11834/jig.200660]
Cui Y T, Jiang C, Wang L M and Wu G S. 2022. MixFormer: end-to-end tracking with iterative mixed attention//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recogni tion. New Orleans, USA: IEEE: 13608-13618[ DOI: 10.1109/CVPR52688.2022.01324 http://dx.doi.org/10.1109/CVPR52688.2022.01324 ]
Danelljan M, Bhat G, Khan F S and Felsberg M. 2017. ECO: efficient convolution operators for tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6931-6939[ DOI: 10.1109/CVPR.2017.733 http://dx.doi.org/10.1109/CVPR.2017.733 ]
Danelljan M, Robinson A, Khan F S and Felsberg M. 2016. Beyond correlation filters: learning continuous convolution operators for visual tracking//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 472-488[ DOI: 10.1007/978-3-319-46454-1_29 http://dx.doi.org/10.1007/978-3-319-46454-1_29 ]
Ding P and Song Y. 2015. Robust object tracking using color and depth images with a depth based occlusion handling and recovery//Proceedings of the 12th International Conference on Fuzzy Systems and Knowledge Discovery. Zhangjiajie, China: IEEE: 930-935[ DOI: 10.1109/FSKD.2015.7382068 http://dx.doi.org/10.1109/FSKD.2015.7382068 ]
Feng M Z, Song K C, Wang Y Y, Liu J and Yan Y H. 2020. Learning discriminative update adaptive spatial-temporal regularized correlation filter for RGB-T tracking. Journal of Visual Communication and Image Representation, 72: #102881[DOI: 10.1016/j.jvcir.2020.102881]
Feng Q, Ablavsky V, Bai Q Y and Sclaroff S. 2021a. Robust visual object tracking with natural language region proposal network[EB/OL ] . [2022-06-02 ] . https://arxiv.org/pdf/1912.02048v1.pdf https://arxiv.org/pdf/1912.02048v1.pdf
Feng Q, Ablavsky V, Bai Q Y and Sclaroff S. 2021b. Siamese natural language tracker: tracking by natural language descriptions with siamese trackers//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 5847-5856[ DOI: 10.1109/CVPR46437.2021.00579 http://dx.doi.org/10.1109/CVPR46437.2021.00579 ]
Gao Y, Li C L, Zhu Y B, Tang J, He T and Wang F T. 2019. Deep adaptive fusion network for high performance RGBT tracking//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul, Korea (South): IEEE: 91-99[ DOI: 10.1109/ICCVW.2019.00017 http://dx.doi.org/10.1109/ICCVW.2019.00017 ]
García G M, Klein D A, Stückler J, Frintrop S and Cremers A B. 2012. Adaptive multi-cue 3D tracking of arbitrary objects//The Joint 34th DAGM and 36th OAGM Symposium. Graz, Austria: Springer: 357-366[ DOI: 10.1007/978-3-642-32717-9_36 http://dx.doi.org/10.1007/978-3-642-32717-9_36 ]
Gehrig D, Rebecq H, Gallego G and Scaramuzza D. 2018. Asynchronous, photometric feature tracking using events and frames//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 766-781[ DOI: 10.1007/978-3-030-01258-8_46 http://dx.doi.org/10.1007/978-3-030-01258-8_46 ]
Gehrig D, Rebecq H, Gallego G and Scaramuzza D. 2020. EKLT: Asynchronous photometric feature tracking using events and frames. International Journal of Computer Vision, 128(3): 601-618[DOI: 10.1007/s11263-019-01209-w]
Gutev A and Debono C J. 2019. Exploiting depth information to increase object tracking robustness//Proceedings of the 18th International Conference on Smart Technologies. Novi Sad, Serbia: IEEE: 1-5[ DOI: 10.1109/EUROCON.2019.8861628 http://dx.doi.org/10.1109/EUROCON.2019.8861628 ]
Hannuna S, Camplani M, Hall J, Mirmehdi M, Damen D, Burghardt T, Paiement A and Tao L L. 2019. DS-KCF: a real-time tracker for RGB-D data. Journal of Real-Time Image Processing, 16(5): 1439-1458[DOI: 10.1007/s11554-016-0654-3]
Hare S, Saffari A and Torr P H S. 2011. Struck: structured output tracking with kernels//Proceedings of the 2011 IEEE/CVF International Conference on Computer Vision. Barcelona, Spain: IEEE: 263-270[ DOI: 10.1109/ICCV.2011.6126251 http://dx.doi.org/10.1109/ICCV.2011.6126251 ]
Hou Y E, Li W G, Rong A Q and Ye G Q. 2013. Tracking algorithm of block sparse representation with background information. Journal of South China University of Technology (Natural Science Edition), 41(8): 21-27
侯跃恩, 李伟光, 容爱琼, 叶国强. 2013. 融合背景信息的分块稀疏表示跟踪算法. 华南理工大学学报(自然科学版), 41(8): 21-27[DOI: 10.3969/j.issn.1000-565X.2013.08.004]
Hu Y H, Liu H J, Pfeiffer M and Delbruck T. 2016. DVS benchmark datasets for object tracking, action recognition, and object recognition. Frontiers in Neuroscience, 10: 405[DOI: 10.3389/fnins.2016.00405]
Huang J, Wang S Z, Guo M H and Chen S S. 2018. Event-guided structured output tracking of fast-moving objects using a CeleX sensor. IEEE Transactions on Circuits and Systems for Video Technology, 28(9): 2413-2417[DOI: 10.1109/TCSVT.2018.2841516]
Jung I, Son J, Baek M and Han B. 2018. Real-time MDNeT//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 89-104[ DOI: 10.1007/978-3-030-01225-0_6 http://dx.doi.org/10.1007/978-3-030-01225-0_6 ]
Kart U, Kämäräinen J K, Matas J, Fan L X and Cricri F. 2018. Depth masked discriminative correlation filter//Proceedings of the 24th International Conference on Pattern Recognition. Beijing, China: IEEE: 2112-2117[ DOI: 10.1109/ICPR.2018.8546179 http://dx.doi.org/10.1109/ICPR.2018.8546179 ]
Kart U, LukežičA, Kristan M, Kämäräinen J K and Matas J. 2019. Object tracking by reconstruction with view-specific discriminative correlation filters//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 1339-1348[ DOI: 10.1109/CVPR.2019.00143 http://dx.doi.org/10.1109/CVPR.2019.00143 ]
Kim H U, Lee D Y, Sim J Y and Kim C S. 2015. SOWP: spatially ordered and weighted patch descriptor for visual tracking//Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3011-3019[ DOI: 10.1109/ICCV.2015.345 http://dx.doi.org/10.1109/ICCV.2015.345 ]
Kueng B, Mueggler E, Gallego G and Scaramuzza D. 2016. Low-latency visual odometry using event-based feature tracks//Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems. Daejeon, Korea (South): IEEE: 16-23[ DOI: 10.1109/IROS.2016.7758089 http://dx.doi.org/10.1109/IROS.2016.7758089 ]
Lan X Y, Ye M, Shao R, Zhong B N, Jain D K and Zhou H Y. 2019a. Online non-negative multi-modality feature template learning for RGB-assisted infrared tracking. IEEE Access, 7: 67761-67771[DOI: 10.1109/ACCESS.2019.2916895]
Lan X Y, Ye M, Zhang S P and Yuen P C. 2018. Robust collaborative discriminative learning for RGB-infrared tracking//Proceedings of the 32nd AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans, USA: AAAI: 7008-7015
Lan X Y, Ye M, Zhang S P, Zhou H Y and Yuen P C. 2020. Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recognition Letters, 130: 12-20[DOI: 10.1016/j.patrec.2018.10.002]
Lan X Y, Zhang W, Zhang S P, Jain D K and Zhou H Y. 2019c. Robust multi-modality anchor graph-based label prediction for RGB-infrared tracking. IEEE Transactions on Industrial Informatics: #2947293[DOI: 10.1109/TⅡ.2019.2947293]
Li B, Yan J J, Wu W, Zhu Z and Hu X L. 2018a. High performance visual t racking with siameseregion proposal network//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8971-8980[ DOI: 10.1109/CVPR.2018.00935 http://dx.doi.org/10.1109/CVPR.2018.00935 ]
Li C L, Cheng H, Hu S Y, Liu X B, Tang J and Lin L. 2016a. Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Transactions on Image Processing, 25(12): 5743-5756[DOI: 10.1109/TIP.2016.2614135]
Li C L, Hu S Y, Gao S H and Tang J. 2016b. Real-time grayscale-thermal tracking via Laplacian sparse representation//Proceedings of the 22nd International Conference on Multimedia Modeling. Miami, USA: Springer: 54-65[ DOI: 10.1007/978-3-319-27674-8_6 http://dx.doi.org/10.1007/978-3-319-27674-8_6 ]
Li C L, Liang X Y, Lu Y J, Zhao N and Tang J. 2019a. RGB-T object tracking: benchmark and baseline. Pattern Recognition, 96: #106977[DOI: 10.1016/j.patcog.2019.106977]
Li C L, Lin L, Zuo W M and Tang J. 2017a. Learning patch-based dynamic graph for visual tracking//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI: 4126-4132
Li C L, Liu L, Lu A D, Ji Q and Tang J. 2020. Challenge-aware RGBT tracking//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 222-237[ DOI: 10.1007/978-3-030-58542-6_14 http://dx.doi.org/10.1007/978-3-030-58542-6_14 ]
Li C L, Lu A D, Zheng A H, Tu Z Z and Tang J. 2019b. Multi-adapter RGBT tracking//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul, Korea (South):IEEE: 2262-2270[ DOI: 10.1109/ICCVW.2019.00279 http://dx.doi.org/10.1109/ICCVW.2019.00279 ]
Li C L, Sun X, Wang X, Zhang L and Tang J. 2017b. Grayscale-thermal object tracking via multitask Laplacian sparse representation. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 47(4): 673-681[DOI: 10.1109/TSMC.2016.2627052]
Li C L, Wu X H, Zhao N, Cao X C and Tang J. 2018b. Fusing two-stream convolutional neural networks for RGB-T object tracking. Neurocomputing, 281: 78-85[DOI: 10.1016/j.neucom.2017.11.068]
Li C L, Xiang Z Q, Tang J, Luo B and Wang F T. 2022. RGBT tracking via noise-robust cross-modal ranking. IEEE Transactions on Neural Networks and Learning Systems, 33(9): 5019-5031[DOI: 10.1109/TNNLS.2021.3067107]
Li C L, Zhao N, Lu Y J, Zhu C L and Tang J. 2017c. Weighted sparse representationregularized graph learning for RGB-T object tracking//Proceedings of the 25th ACM International Conference on Multimedia. Mountain View, USA: ACM: 1856-1864[ DOI: 10.1145/3123266.3123289 http://dx.doi.org/10.1145/3123266.3123289 ]
Li C L, Zhu C L, Huang Y, Tang J and Wang L. 2018c. Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 831-847[ DOI: 10.1007/978-3-030-01261-8_49 http://dx.doi.org/10.1007/978-3-030-01261-8_49 ]
Li C L, Zhu C L, Zhang J, Luo B, Wu X H and Tang J. 2019c. Learning local-global multi-graph descriptors for RGB-T object tracking. IEEE Transactions on Circuits and Systems for Video Technology, 29(10): 2913-2926[DOI: 10.1109/TCSVT.2018.2874312]
Li C L, Zhu C L, Zheng S F, Luo B and Tang J. 2018d. Two-stage modality-graphs regularized manifold ranking for RGB-T tracking. Signal Processing: Image Communication, 68: 207-217[DOI: 10.1016/j.image.2018.08.004]
Li L, Li C L, Tu Z Z and Tang J. 2018e. A fusion approach to grayscale-thermal tracking with cross-modal sparse representation//Proceedings of the 13th Conference on Image and Graphics Techno logies and Applications. Beijing, China: Springer: 494-505[ DOI: 10.1007/978-981-13-1702-6_49 http://dx.doi.org/10.1007/978-981-13-1702-6_49 ]
Li W G, Hou Y E, Lou H D and Ye G Q. 2012. Robust visual tracking based on Gabor feature and sparse representation//Proceedings of 2012 IEEE International Conference on Robotics and Biomimetics. Guangzhou, China: IEEE: 1829-1835[ DOI: 10.1109/ROBIO.2012.6491234 http://dx.doi.org/10.1109/ROBIO.2012.6491234 ]
Li Z Y, Tao R, Gavves E, Snoek C G M and Smeulders A W M. 2017d. Tracking by natural language specification//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 7350-7358[ DOI: 10.1109/CVPR.2017.777 http://dx.doi.org/10.1109/CVPR.2017.777 ]
Liu B Y, Yang L, Huang J Z, Meer P, Gong L G and Kulikowski C. 2010. Robust and fast collaborative tracking with two stage sparse optimization//Proceedings of the 11th European Conference on Computer Vision. Heraklion, Greece: Springer: 624-637[ DOI: 10.1007/978-3-642-15561-1_45 http://dx.doi.org/10.1007/978-3-642-15561-1_45 ]
Liu H J, Moeys D P, Das G, Neil D, Liu S C and Delbrück T. 2016. Combined frame- and event-based detection and tracking//2016 IEEE International Symposium on Circuits and Systems. Montreal, Canada: IEEE: 2511-2514[ DOI: 10.1109/ISCAS.2016.7539103 http://dx.doi.org/10.1109/ISCAS.2016.7539103 ]
Liu H P and Sun F C. 2012. Fusion tracking in color and infrared images using joint sparse representation. Science China Information Sciences, 55(3): 590-599[DOI: 10.1007/s11432-011-4536-9]
Liu W C, Tang X A and Zhao C L. 2020. Robust RGBD tracking via weighted convolution operators. IEEE Sensors Journal, 20(8): 4496-4503[DOI: 10.1109/JSEN.2020.2964019]
Liu Y, Jing X Y, Nie J H, Gao H, Liu J and Jiang G P. 2019. Context-aware three-dimensional mean-shift with occlusion handling for robust object tracking in RGB-D videos. IEEE Transactions on Multimedia, 21(3): 664-677[DOI: 10.1109/TMM.2018.2863604]
Lu A D, Li C L, Yan Y Q, Tang J and Luo B. 2021. RGBT tracking via multi-adapter network with hierarchical divergence loss. IEEE Transactions on Image Processing, 30: 5613-5625[DOI: 10.1109/TIP.2021.3087341]
Lu A D, Qian C, Li C L, Tang J and Wang L. 2022. Duality-gated mutual condition network for RGBT tracking. IEEE Transactions on Neural Networks and Learning Systems: #3157594[DOI: 10.1109/TNNLS.2022.3157594]
Ma C, Huang J B, Yang X K and Yang M H. 2015. Hierarchical convolutional features for visual tracking//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3074-3082[ DOI: 10.1109/ICCV.2015.352 http://dx.doi.org/10.1109/ICCV.2015.352 ]
Ma Z A and Xiang Z Y. 2017. Robust object tracking with RGBD-based sparse learning. Frontiers of Information Technology and Electronic Engineering, 18(7): 989-1001[DOI: 10.1631/FITEE.1601338]
Mayer C, Danelljan M, Paudel D P and van Gool L. 2021. Learning target candidate association to keep track of what not to track//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 13424-13434[ DOI: 10.1109/ICCV48922.2021.01319 http://dx.doi.org/10.1109/ICCV48922.2021.01319 ]
Mei J T, Zhou D M, Cao J D, Nie R C and Guo Y B. 2021. HDINet: hierarchical dual-sensor interaction network for RGBT tracking. IEEE Sensors Journal, 21(15): 16915-16926[DOI: 10.1109/JSEN.2021.3078455]
Mei X and Ling H B. 2009. Robust visual tracking using l 1 minimization//Proceedings of the 12th IEEE International Conference on Computer Vision . Kyoto, Japan: IEEE: 1436-1443[ DOI: 10.1109/ICCV.2009.5459292 http://dx.doi.org/10.1109/ICCV.2009.5459292 ] .
Meshgi K, Maeda S I, Oba S, Skibbe H, Li Y Z and Ishii S. 2016. An occlusion-aware particle filter tracker to handle complex and persistent occlusions. Computer Vision and Image Understanding, 150: 81-94[DOI: 10.1016/j.cviu.2016.05.011]
Nam H and Han B. 2016. Learning multi-domain convolutional neural networks for visual tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4293-4302[ DOI: 10.1109/CVPR.2016.465 http://dx.doi.org/10.1109/CVPR.2016.465 ]
Ning J F, Yang J M, Jiang S J, Zhang L and Yang M H. 2016. Object tracking via dual linear structured SVM and explicit feature map//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4266-4274[ DOI: 10.1109/CVPR.2016.462 http://dx.doi.org/10.1109/CVPR.2016.462 ]
Peng J C, Zhao H T, Hu Z W, Yi Z and Wang B F. 2021. Siamese infrared and visible light fusion network for RGB-T tracking[EB/OL ] . [2022-06-02 ] . https://arxiv.org/pdf/2103.07302.pdf https://arxiv.org/pdf/2103.07302.pdf
Qi Y K, Zhang S P, Qin L, Yao H X, Huang Q M, Lim J and Yang M H. 2016. Hedged deep tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4303-4311[ DOI: 10.1109/CVPR.2016.466 http://dx.doi.org/10.1109/CVPR.2016.466 ]
Qi Y K, Zhang S P, Zhang W G, Su L, Huang Q M and Yang M X. 2019. Learning attribute-specific representations for visual tracking//Proceedings of the 33rd AAAI Conference on Artificial Intelligence and the 31st Innovative Applications of Artificial Intelligence Conference and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence. Honolulu, USA: AAAI: 8835-8842[ DOI: 10.1609/aaai.v33i01.33018835 http://dx.doi.org/10.1609/aaai.v33i01.33018835 ]
Schnelle S R and Chan A L. 2011. Enhanced target tracking through infrared-visible image fusion//Proceedings of the 14th International Conference on Information Fusion. Chicago, USA: IEEE: 1-8
Shen L F, Wang X X, Liu L, Hou B, Jian Y L, Tang J and Luo B. 2022. RGBT tracking based on cooperative low-rank graph model. Neurocomputing, 492: 370-381[DOI: 10.1016/j.neucom.2022.04.032]
Shi H Z, Gao C X and Sang N. 2015. Using consistency of depth gradient to i mprove visual tracking in RGB-D sequences//Proceedings of 2015 Chinese Automation Congress. Wuhan, China: IEEE: 518-522[ DOI: 10.1109/CAC.2015.7382555 http://dx.doi.org/10.1109/CAC.2015.7382555 ]
Song S R and Xiao J X. 2013. Tracking revisited using RGBD camera: unified benchmark and baselines//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 233-240[ DOI: 10.1109/ICCV.2013.36 http://dx.doi.org/10.1109/ICCV.2013.36 ]
Song Y B, Ma C, Wu X H, Gong L J, Bao L C, Zuo W M, Shen C H, Lau R W H and Yang M H. 2018. VITAL: visual tracking via adversarial learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8990-8999[ DOI: 10.1109/CVPR.2018.00937 http://dx.doi.org/10.1109/CVPR.2018.00937 ]
Tedaldi D, Gallego G, Mueggler E and Scaramuzza D. 2016. Feature detection and tracking with the dynamic and active-pixel vision sensor (DAVIS)//Proceedings of the 2nd International Conference on Event-based Control, Communication, and Signal Processing. Krakow, Poland: IEEE: 1-7[ DOI: 10.1109/EBCCSP.2016.7605086 http://dx.doi.org/10.1109/EBCCSP.2016.7605086 ]
Tu Z Z, Lin C, Zhao W, Li C L and Tang J. 2022. M5L: multi-modal multi-margin metric learning for RGBT tracking. IEEE Transactions on Image Processing, 31: 85-98[DOI: 10.1109/TIP.2021.3125504]
Valmadre J, Bertinetto L, Henriques J, Vedaldi A and Torr P H S. 2017. End-to-end representation learning for correlation filter based tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5000-5008[ DOI: 10.1109/CVPR.2017.531 http://dx.doi.org/10.1109/CVPR.2017.531 ]
Wang C Q, Xu C Y, Cui Z, Zhou L, Zhang T, Zhang X Y and Yang J. 2020a. Cross-modal pattern-propagation for RGB-T tracking//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 7062-7071[ DOI: 10.1109/CVPR42600.2020.00709 http://dx.doi.org/10.1109/CVPR42600.2020.00709 ]
Wang L J, Ouyang W L, Wang X G and Lu H C. 2015. Visual tracking with fully convolutional networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3119-3127[ DOI: 10.1109/ICCV.2015.357 http://dx.doi.org/10.1109/ICCV.2015.357 ]
Wang N, Zhou W G, Wang J and Li H Q. 2021a. Transformer meets tracker: exploiting temporal context for robust visual tracking//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Patter n Recognition. Nashville, USA: IEEE: 1571-1580[ DOI: 10.1109/CVPR46437.2021.00162 http://dx.doi.org/10.1109/CVPR46437.2021.00162 ]
Wang Q, Fang J W and Yuan Y. 2014. Multi-cue based tracking. Neurocomputing, 131: 227-236[DOI: 10.1016/j.neucom.2013.10.021]
Wang X, Li J N, Zhu L, Zhang Z P, Chen Z, Li X, Wang Y W, Tian Y H and Wu F. 2022. VisEvent: reliable object tracking via collaboration of frame and event flows[EB/OL ] . [2022-06-02 ] . https://arxiv.org/pdf/2108.05015.pdf https://arxiv.org/pdf/2108.05015.pdf
Wang X, Shu X J, Zhang Z P, Jiang B, Wang Y W, Tian Y H and Wu F. 2021b. Towards more flexible and accurate object tracking with natural language: algorithms and benchmark//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 13758-13768[ DOI: 10.1109/CVPR46437.2021.01355 http://dx.doi.org/10.1109/CVPR46437.2021.01355 ]
Wang Y, Wei X, Shen H, Ding L and Wan J Q. 2020b. Robust fusion for RGB-D tracking using CNN features. Applied Soft Computing, 92: #106302[DOI: 10.1016/j.asoc.2020.106302]
Wang Y L, Li C L and Tang J. 2018 b. Learning soft-consistent correlation filters for RGB-T object tracking//Proceedings of the 1st Chinese Conference on Pattern Recognition and Computer Vision. Guangzhou, China: Springer: 295-306[ DOI: 10.1007/978-3-030-03341-5_25 http://dx.doi.org/10.1007/978-3-030-03341-5_25 ]
Wu Y, Blasch E, Chen G S, Bai L and Ling H B. 2011. Multiple source data fusion via sparse representation for robust visual tracking//Proceedings of the 14th International Conference on Information Fusion. Chicago, USA: IEEE: 1-8
Xiao J J, Stolkin R, Gao Y Q and Leonardis A. 2018. Robust fusion of color and depth data for RGB-D target tracking using adaptive range-invariant depth models and spatio-temporal consistency constraints. IEEE Transactions on Cybernetics, 48(8): 2485-2499[DOI: 10.1109/TCYB.2017.2740952]
Xiao Y, Yang M M, Li C L, Liu L and Tang J. 2022. Attribute-based progressive fusion network for RGBT tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3): 2831-2838[DOI: 10.1609/aaai.v36i3.20187]
Xie Y J, Lu Y and Gu S. 2019. RGB-D object tracking with occlusion detection//Proceedings of the 15th International Conference on Computational Intelligence and Security. Macao, China: IEEE: 11-15[ DOI: 10.1109/CIS.2019.00011 http://dx.doi.org/10.1109/CIS.2019.00011 ]
Xu Q, Mei Y M, Liu J P and Li C L. 2022. Multimodal cross-layer bilinear pooling for RGBT tracking. IEEE Transactions on Multimedia, 24: 567-580[DOI: 10.1109/TMM.2021.3055362]
Xu Y D, Wang Z Y, Li Z X, Yuan Y and Yu G. 2020. SiamFC+ +: towards robust and accurate visual tracking with target estimation guidelines. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7): 12549-12556[DOI: 10.1609/aaai.v34i07.6944]
Yan B, Peng H W, Fu J L, Wang D and Lu H C. 2021a. Learning spatio-temporal transformer for visual tracking//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 10428-10437[ DOI: 10.1109/ICCV48922.2021.01028 http://dx.doi.org/10.1109/ICCV48922.2021.01028 ]
Yan S, Yang J Y, KäpyläJ, Zheng F, Leonardis A and Kämäräinen J K. 2021b. DepthTrack: unveiling the power of RGBD tracking//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 10705-10713[ DOI: 10.1109/ICCV48922.2021.01055 http://dx.doi.org/10.1109/ICCV48922.2021.01055 ]
Yang Z Y, Kumar T, Chen T L, Su J S and Luo J B. 2021. Grounding-tracking-integration. IEEE Transactions on Circuits and Systems for Video Technology, 31(9): 3433-3443[DOI: 10.1109/TCSVT.2020.3038720]
Yang Z Y, Wu Y J, Wang G R, Yang Y K, Li G Q, Deng L, Zhu J and Shi L P. 2019. DashNet: a hybrid artificial and spiking neural network for high-speed object tracking[EB/OL ] . [2022-06-02 ] . http://arxiv.org/pdf/1909.12942.pdf http://arxiv.org/pdf/1909.12942.pdf
Yun X, Sun Y J, Yang X X and Lu N N. 2019. Discriminative fusion correlation learning for visible and infrared tracking. Mathematical Problems in Engineering, 2019: #2437521[DOI: 10.1155/2019/2437521]
Zhai S L, Shao P P, Liang X Y and Wang X. 2019. Fast RGB-T tracking via cross-modal correlation filters. Neurocomputing, 334: 172-181[DOI: 10.1016/j.neucom.2019.01.022]
Zhai Y Y, Song P, Mou Z L, Chen X X and Liu X J. 2018. Occlusion-aware correlation particle filter target tracking based on RGBD data. IEEE Access, 6: 50752-50764[DOI: 10.1109/ACCESS.2018.2869766]
Zhang H, Zhang L, Zhuo L and Zhang J. 2020a. Object tracking in RGB-T videos using modal-aware attention network and competitive learning. Sensors, 20(2): #393[DOI: 10.3390/s20020393]
Zhang J Q, Yang X, Fu Y K, Wei X P, Yin B C and Dong B. 2021a. Object tracking by jointly exploiting frame and event domain//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 13023-13032[ DOI: 10.1109/ICCV48922.2021.01280 http://dx.doi.org/10.1109/ICCV48922.2021.01280 ]
Zhang K H, Zhang L and Yang M H. 2012. Real-time compressive tracking//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer: 864-877[ DOI: 10.1007/978-3-642-33712-3_62 http://dx.doi.org/10.1007/978-3-642-33712-3_62 ]
Zhang L C, Danelljan M, Gonzalez-Garcia A, van de Weijer J and Khan F S. 2019a. Multi-modal fusion for end-to-end RGB-T tracking//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul, Korea (South): IEEE: 2252-2261[ DOI: 10.1109/ICCVW.2019.00278 http://dx.doi.org/10.1109/ICCVW.2019.00278 ]
Zhang P Y, Wang D and Lu H C. 2020b. Multi-modal visual tracking: review and experimental comparison[EB/OL ] . [2022-06-02 ] . https://arxiv.org/pdf/2012.04176.pdf https://arxiv.org/pdf/2012.04176.pdf
Zhang P Y, Wang D, Lu H C and Yang X Y. 2021b. Learning adaptive attribute-driven representation for real-time RGB-T tracking. International Journal of Computer Vision, 129(9): 2714-2729[DOI: 10.1007/s11263-021-01495-3]
Zhang P Y, Zhao J, Bo C J, Wang D, Lu H C and Yang X Y. 2021c. Jointly modeling motion and appearance cues for robust RGB-T tracking. IEEE Transactions on Image Processing, 30: 3335-3347[DOI: 10.1109/TIP.2021.3060862]
Zhang P Y, Zhao J, Wang D, Lu H C, and Ruan X. 2022a. Visible-thermal UAV tracking: a large-scale benchmark and new baseline//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 8876-8885[ DOI: 10.1109/CVPR52688.2022.00868 http://dx.doi.org/10.1109/CVPR52688.2022.00868 ]
Zhang T L, Liu X R, Zhang Q and Han J G. 2022b. SiamCDA: complementarity- and distractor-aware RGB-T tracking based on Siamese network. IEEE Transactions on Circuits and Systems for Video Technology, 32(3): 1403-1417[DOI: 10.1109/TCSVT.2021.3072207]
Zhang X C, Ye P, Leung H, Gong K and Xiao G. 2020d. Object fusion tracking based on visible and infrared images: a comprehensive review. Information Fusion, 63: 166-187[DOI: 10.1016/j.inffus.2020.05.002]
Zhang X C, Ye P, Peng S Y, Liu J and Xiao G. 2020e. DSiamMFT: an RGB-T fusion tracking method via dynamic Siamese networks using multi-layer feature fusion. Signal Processing: Image Communication, 84: #115756[DOI: 10.1016/j.image.2019.115756]
Zhang X C, Ye P, Peng S Y, Liu J, Gong K and Xiao G. 2019b. SiamFT: an RGB-infrared fusion tracking method via fully convolutional siamese networks. IEEE Access, 7: 122122-122133[DOI: 10.1109/ACCESS.2019.2936914]
Zhang Y L, Qian X Y, Zhang M and Ge H J. 2020. Correlation filter target tracking algorithm based on adaptive multifeature fusion. Journal of Image and Graphics, 25(6): 1160-1170
张艳琳, 钱小燕, 张淼, 葛红娟. 2020. 自适应多特征融合相关滤波目标跟踪. 中国图象图形学报, 25(6): 1160-1170[DOI: 10.1016/j.inffus.2020.05.002]
Zhang Z P and Peng H W. 2019. Deeper and wider siamese networks for real-time visual tracking//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4586-4595[ DOI: 10.1109/CVPR.2019.00472 http://dx.doi.org/10.1109/CVPR.2019.00472 ]
Zhao P Y, Liu Q L, Wang W and Guo Q. 2020. TSDM: tracking by SiamRPN+ + with a depth-refiner and a mask-generator//Proceedingsof the 25th International Conference on Pattern Recognition. Milan, Italy: IEEE: 670-676[ DOI: 10.1109/ICPR48806.2021.9413315 http://dx.doi.org/10.1109/ICPR48806.2021.9413315 ]
Zhu Y B, Li C L, Lu Y J, Lin L, Luo B and Tang J. 2019a. FANet: quality-aware feature aggregation network for RGB-T tracking[EB/OL ] . [2022-06-02 ] . https://arxiv.org/pdf/1811.09855v1.pdf https://arxiv.org/pdf/1811.09855v1.pdf
Zhu Y B, Li C L, Luo B, Tang J and Wang X. 2019b. Dense feature aggregation and pruning for RGBT tracking//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France: ACM: 465-472[ DOI: 10.1145/3343031.3350928 http://dx.doi.org/10.1145/3343031.3350928 ]
Zhu Y B, Li C L, Tang J, Luo B and Wang L. 2022. RGBT tracking by trident fusion network. IEEE Transactions on Circuits and Systems for Video Technology, 32(2): 579-592[DOI: 10.1109/TCSVT.2021.3067997]
相关作者
相关机构