多模态视觉跟踪方法综述

李成龙; 鹿安东; 刘磊; 汤进

doi:10.11834/jig.220578

综述 | 浏览量 : 0 下载量: 3 CSCD: 1

PDF
导出
分享
收藏
专辑

多模态视觉跟踪方法综述
Multi-modal visual tracking: a survey
2023年28卷第1期页码：37-56
纸质出版日期： 2023-01-16 ，

录用日期： 2022-07-15
DOI： 10.11834/jig.220578
稿件说明：

移动端阅览

李成龙, 鹿安东, 刘磊, 汤进. 多模态视觉跟踪方法综述[J]. 中国图象图形学报, 2023,28(1):37-56.

Chenglong Li, Andong Lu, Lei Liu, Jin Tang. Multi-modal visual tracking: a survey[J]. Journal of Image and Graphics, 2023,28(1):37-56.
李成龙, 鹿安东, 刘磊, 汤进. 多模态视觉跟踪方法综述[J]. 中国图象图形学报, 2023,28(1):37-56. DOI： 10.11834/jig.220578.

Chenglong Li, Andong Lu, Lei Liu, Jin Tang. Multi-modal visual tracking: a survey[J]. Journal of Image and Graphics, 2023,28(1):37-56. DOI： 10.11834/jig.220578.

摘要

目标跟踪是计算机视觉研究中的前沿和热点问题，在安全监控、无人驾驶等领域中有着重要的应用价值。然而，目前基于可见光数据的视觉跟踪方法，在光照变化、恶劣天气下因数据质量受限难以实现鲁棒跟踪。因此，一些研究者提出了多模态视觉跟踪任务，通过引入其他模态数据，包括红外模态、深度模态、事件模态以及文本模态，在一定程度上弥补了可见光模态在恶劣天气、遮挡、快速运动和外观歧义等条件下的不足。多模态视觉跟踪旨在挖掘可见光和其他模态数据的互补优势，在视频中实现鲁棒的目标定位，对全天时全天候感知有着重要的价值和意义，受到越来越多的研究和关注。由于主流的多模态视觉跟踪方法针对可见光—红外跟踪展开，因此，本文以阐述可见光—红外跟踪方法为主，从信息融合的角度将现有方法划分为结合式融合和判别式融合，分别进行了详细介绍和分析，并对不同类方法的优缺点进行了分析和比较。然后，本文对其他多模态视觉跟踪任务的研究工作进行了介绍，并对不同多模态视觉跟踪任务的优缺点进行了分析和比较。最后，本文对多模态视觉跟踪方法进行了总结并对未来发展进行展望。

Abstract

Visual tracking can be as one of the key tasks in computer vision applications like surveillance

robotics and automatic driving in the past decades. The performance issue for visual tracking is still challenged of the quality of visible light data in adverse scenes

such as low illumination

background clutter

haze and smog. To deal with the imaging constraints of visible light data

current researches are focused on multiple modal data-introduced in common. The visible and modal data integration can be effective in tracking performance in terms of the manner of thermal infrared

depth

event and language. Benefiting from the integrated capability of visible and multi-modal data

multi-modal trackers have been developing intensively in such complicated scenarios of those are low illumination

occlusion

fast motion and semantic ambiguity. Nowadays

our executive summary is focused on reviewing the RGB and thermal infrared(RGBT) tracking algorithms

which is oriented for the popular visible-infrared visual tracking towards multi-modal visual tracking. Existing multi-modal visual tracking-based summaries are concerned of the segmentation of tracking algorithms in terms of multi-framework tracking or multi-level based fusions derived of pixel

feature

and decision. With respect of the information fusion plays a key role in multi-modal visual tracking

we divide and analyze existing RGBT tracking methods from the perspective of information fusion

including synthesized and specific-based fusions. Specifically

the fusion-integrated can be used to combine all multimodal information together via different fusion methods

including: 1) sparse representation fusion

2) collaborative graph representation fusion

3) modality-synthesized and modality-specific information fusion

and 4) attribute-based feature decoupling fusion. First

sparse representation fusion has a good ability to suppress feature noise

but most of these algorithms are restricted by the time-consuming online optimization of the sparse representation models. In addition

these methods can be used as target representation via pixel values

and thus have low robustness in complex scenes. Second

collaborative graph representation fusion can be used to suppress the effect of background clutter in terms of modality weights and local image patch weights. However

these methods are required for multi-variables optimization iteratively

and the tracking efficiency is quite lower. Furthermore

these models are required to use color and gradient features

which are better than pixel values but also hard to deal with challenging scenarios. Third

modality-synthesized and modality-specific information fusion can use be used to model modality-synthesized and modality-specific representations based on different sub-networks and provide an effective fusion strategy for tracking. However

these methods are lack of the information interaction in the learning of modality-specific representations

and thus introduce noises and redundancy easily. Fourth

attribute-based feature decoupling fusion can be applied to model the target representations under different attributes

and it alleviates the dependence on large-scale training data more. However

it is difficult to cover all challenging problems in practical applications. Although these fusion-synthesized methods have achieved good tracking performance

all multiple modalities information-synthesized have to introduce the information redundancy and feature noises inevitably. To resolve these problems

some researches have been concerned of fusion-specific methods in RGBT tracking. This sort of fusion is aimed to mine the specific features of multiple modalities for effective and efficient information fusion

including: 1) feature-selected fusion

2) attention-mechanism-based adaptive fusion

3)mutual-enhanced fusion

and 4)other fusion-specific methods. Feature selection fusion is employed to select specific features in regular. It can not only avoid the interference of data noises and is beneficial to improving tracking performance

but also eliminate data redundancy. However

the selection criteria are hard to be designed

and unsuitable criterion often removes useful information under low-quality data and thus limits tracking performance. Adaptive fusion is aimed to estimate the reliability of multi-modal data in term of attention mechanism

including the modality

spatial and channel reliabilities

and thus achieves the adaptive fusion of multi-modal information. However

there is no clear supervision to generate these weights-reliable

which is possible to mislead the estimated results in complex scenarios. Mutual enhancement fusion is focused on data noises-related suppression for low-quality modality and its features can be enhanced between the specific information and other modality. These methods can be implemented to mine the specific information of multiple modalities and improve target representations of low-quality modalities. However

these methods are complicated and have low tracking efficiency.The task of multi-modal vision tracking has three sub-tasks besides RGBT tracking

including: 1) visible-depth tracking (called RGB and depth(RGBD) tracking)

2) visible-event tracking (called RGBE(RGB and event) tracking)

3) visible-language tracking (called RGB and language(RGBL) tracking). We review these three multi-modal visual tracking issues in brief as well. Furthermore

we predict some academic challenges and future directions for multi-modal visual tracking.

关键词

信息融合视觉跟踪多模态结合式融合判别式融合

Keywords

information fusionvisual trackingmultiple modalitiescombinative fusiondiscriminative fusion

references

Bertinetto L, Valmadre J, Henriques J F, Vedaldi A and Torr P H S. 2016. Fully-convolutional siamese networks for object tracking//Proceedings of the European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 850-865[DOI: 10.1007/978-3-319-48881-3_56http://dx.doi.org/10.1007/978-3-319-48881-3_56]

Bhat G, Danelljan M, Gool L V and Timofte R. 2019. Learning discriminative model prediction for tracking//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 6181-6190[DOI: 10.1109/ICCV.2019.00628http://dx.doi.org/10.1109/ICCV.2019.00628]

Bhat G, Lawin F J, Danelljan M, Robinson A, Felsberg M, Gool L V and Timofte R. 2020. Learning What to Learn for Video Object Segmentation//Proceedings of 2020 European Conference on Computer Vision. Springer: 777-794[DOI: 10.1007/978-3-030-58536-5_46http://dx.doi.org/10.1007/978-3-030-58536-5_46]

Bibi A, Zhang T Z and Ghanem B. 2016. 3D part-based sparse tracker with automatic synchronization and registration//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1439-1448[DOI: 10.1109/CVPR.2016.160http://dx.doi.org/10.1109/CVPR.2016.160]

Chan A L and Schnelle S R. 2012. Target tracking using concurrent visible and infrared imageries//Proceedings Volume 8392, Signal Processing, Sensor Fusion, and Target Recognition XXI. Baltimore, United States: SPIE: #918373[DOI: 10.1117/12.918373http://dx.doi.org/10.1117/12.918373]

Chan A L and Schnelle S R. 2013. Fusing concurrent visible and infrared videos for improved tracking performance. Optical Engineering, 52(1): #017004[DOI: 10.1117/1.OE.52.1.017004]

Chen X, Yan B, Zhu J W, Wang D, Yang X Y and Lu H C. 2021. Transformer tracking//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 8122-8131[DOI: 10.1109/CVPR46437.2021.00803http://dx.doi.org/10.1109/CVPR46437.2021.00803]

Chen Y, Shen Y J, Liu X and Zhong B N. 2015. 3D object tracking via image sets and depth-based occlusion detection. Signal Processing, 112: 146-153[DOI: 10.1016/j.sigpro.2014.08.046]

Chen Z L and Shi F H. 2022. Double template fusion based siamese network for robust visual object tracking. Journal of Image and Graphics, 27(4): 1191-1203

陈志良, 石繁槐. 2022. 结合双模板融合与孪生网络的鲁棒视觉目标跟踪. 中国图象图形学报, 27(4): 1191-1203[DOI: 10.11834/jig.200660]

Cui Y T, Jiang C, Wang L M and Wu G S. 2022. MixFormer: end-to-end tracking with iterative mixed attention//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 13608-13618[DOI: 10.1109/CVPR52688.2022.01324http://dx.doi.org/10.1109/CVPR52688.2022.01324]

Danelljan M, Bhat G, Khan F S and Felsberg M. 2017. ECO: efficient convolution operators for tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6931-6939[DOI: 10.1109/CVPR.2017.733http://dx.doi.org/10.1109/CVPR.2017.733]

Danelljan M, Robinson A, Khan F S and Felsberg M. 2016. Beyond correlation filters: learning continuous convolution operators for visual tracking//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 472-488[DOI: 10.1007/978-3-319-46454-1_29http://dx.doi.org/10.1007/978-3-319-46454-1_29]

Ding P and Song Y. 2015. Robust object tracking using color and depth images with a depth based occlusion handling and recovery//Proceedings of the 12th International Conference on Fuzzy Systems and Knowledge Discovery. Zhangjiajie, China: IEEE: 930-935[DOI: 10.1109/FSKD.2015.7382068http://dx.doi.org/10.1109/FSKD.2015.7382068]

Feng M Z, Song K C, Wang Y Y, Liu J and Yan Y H. 2020. Learning discriminative update adaptive spatial-temporal regularized correlation filter for RGB-T tracking. Journal of Visual Communication and Image Representation, 72: #102881[DOI: 10.1016/j.jvcir.2020.102881]

Feng Q, Ablavsky V, Bai Q Y and Sclaroff S. 2021a. Robust visual object tracking with natural language region proposal network[EB/OL]. [2022-06-02].https://arxiv.org/pdf/1912.02048v1.pdfhttps://arxiv.org/pdf/1912.02048v1.pdf

Feng Q, Ablavsky V, Bai Q Y and Sclaroff S. 2021b. Siamese natural language tracker: tracking by natural language descriptions with siamese trackers//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 5847-5856[DOI: 10.1109/CVPR46437.2021.00579http://dx.doi.org/10.1109/CVPR46437.2021.00579]

Gao Y, Li C L, Zhu Y B, Tang J, He T and Wang F T. 2019. Deep adaptive fusion network for high performance RGBT tracking//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul, Korea (South): IEEE: 91-99[DOI: 10.1109/ICCVW.2019.00017http://dx.doi.org/10.1109/ICCVW.2019.00017]

García G M, Klein D A, Stückler J, Frintrop S and Cremers A B. 2012. Adaptive multi-cue 3D tracking of arbitrary objects//The Joint 34th DAGM and 36th OAGM Symposium. Graz, Austria: Springer: 357-366[DOI: 10.1007/978-3-642-32717-9_36http://dx.doi.org/10.1007/978-3-642-32717-9_36]

Gehrig D, Rebecq H, Gallego G and Scaramuzza D. 2018. Asynchronous, photometric feature tracking using events and frames//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 766-781[DOI: 10.1007/978-3-030-01258-8_46http://dx.doi.org/10.1007/978-3-030-01258-8_46]

Gehrig D, Rebecq H, Gallego G and Scaramuzza D. 2020. EKLT: Asynchronous photometric feature tracking using events and frames. International Journal of Computer Vision, 128(3): 601-618[DOI: 10.1007/s11263-019-01209-w]

Gutev A and Debono C J. 2019. Exploiting depth information to increase object tracking robustness//Proceedings of the 18th International Conference on Smart Technologies. Novi Sad, Serbia: IEEE: 1-5[DOI: 10.1109/EUROCON.2019.8861628http://dx.doi.org/10.1109/EUROCON.2019.8861628]

Hannuna S, Camplani M, Hall J, Mirmehdi M, Damen D, Burghardt T, Paiement A and Tao L L. 2019. DS-KCF: a real-time tracker for RGB-D data. Journal of Real-Time Image Processing, 16(5): 1439-1458[DOI: 10.1007/s11554-016-0654-3]

Hare S, Saffari A and Torr P H S. 2011. Struck: structured output tracking with kernels//Proceedings of the 2011 IEEE/CVF International Conference on Computer Vision. Barcelona, Spain: IEEE: 263-270[DOI: 10.1109/ICCV.2011.6126251http://dx.doi.org/10.1109/ICCV.2011.6126251]

Hou Y E, Li W G, Rong A Q and Ye G Q. 2013. Tracking algorithm of block sparse representation with background information. Journal of South China University of Technology (Natural Science Edition), 41(8): 21-27

侯跃恩, 李伟光, 容爱琼, 叶国强. 2013. 融合背景信息的分块稀疏表示跟踪算法. 华南理工大学学报(自然科学版), 41(8): 21-27[DOI: 10.3969/j.issn.1000-565X.2013.08.004]

Hu Y H, Liu H J, Pfeiffer M and Delbruck T. 2016. DVS benchmark datasets for object tracking, action recognition, and object recognition. Frontiers in Neuroscience, 10: 405[DOI: 10.3389/fnins.2016.00405]

Huang J, Wang S Z, Guo M H and Chen S S. 2018. Event-guided structured output tracking of fast-moving objects using a CeleX sensor. IEEE Transactions on Circuits and Systems for Video Technology, 28(9): 2413-2417[DOI: 10.1109/TCSVT.2018.2841516]

Jung I, Son J, Baek M and Han B. 2018. Real-time MDNeT//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 89-104[DOI: 10.1007/978-3-030-01225-0_6http://dx.doi.org/10.1007/978-3-030-01225-0_6]

Kart U, Kämäräinen J K, Matas J, Fan L X and Cricri F. 2018. Depth masked discriminative correlation filter//Proceedings of the 24th International Conference on Pattern Recognition. Beijing, China: IEEE: 2112-2117[DOI: 10.1109/ICPR.2018.8546179http://dx.doi.org/10.1109/ICPR.2018.8546179]

Kart U, LukežičA, Kristan M, Kämäräinen J K and Matas J. 2019. Object tracking by reconstruction with view-specific discriminative correlation filters//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 1339-1348[DOI: 10.1109/CVPR.2019.00143http://dx.doi.org/10.1109/CVPR.2019.00143]

Kim H U, Lee D Y, Sim J Y and Kim C S. 2015. SOWP: spatially ordered and weighted patch descriptor for visual tracking//Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3011-3019[DOI: 10.1109/ICCV.2015.345http://dx.doi.org/10.1109/ICCV.2015.345]

Kueng B, Mueggler E, Gallego G and Scaramuzza D. 2016. Low-latency visual odometry using event-based feature tracks//Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems. Daejeon, Korea (South): IEEE: 16-23[DOI: 10.1109/IROS.2016.7758089http://dx.doi.org/10.1109/IROS.2016.7758089]

Lan X Y, Ye M, Shao R, Zhong B N, Jain D K and Zhou H Y. 2019a. Online non-negative multi-modality feature template learning for RGB-assisted infrared tracking. IEEE Access, 7: 67761-67771[DOI: 10.1109/ACCESS.2019.2916895]

Lan X Y, Ye M, Zhang S P and Yuen P C. 2018. Robust collaborative discriminative learning for RGB-infrared tracking//Proceedings of the 32nd AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans, USA: AAAI: 7008-7015

Lan X Y, Ye M, Zhang S P, Zhou H Y and Yuen P C. 2020. Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recognition Letters, 130: 12-20[DOI: 10.1016/j.patrec.2018.10.002]

Lan X Y, Zhang W, Zhang S P, Jain D K and Zhou H Y. 2019c. Robust multi-modality anchor graph-based label prediction for RGB-infrared tracking. IEEE Transactions on Industrial Informatics: #2947293[DOI: 10.1109/TⅡ.2019.2947293]

Li B, Yan J J, Wu W, Zhu Z and Hu X L. 2018a. High performance visual tracking with siameseregion proposal network//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8971-8980[DOI: 10.1109/CVPR.2018.00935http://dx.doi.org/10.1109/CVPR.2018.00935]

Li C L, Cheng H, Hu S Y, Liu X B, Tang J and Lin L. 2016a. Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Transactions on Image Processing, 25(12): 5743-5756[DOI: 10.1109/TIP.2016.2614135]

Li C L, Hu S Y, Gao S H and Tang J. 2016b. Real-time grayscale-thermal tracking via Laplacian sparse representation//Proceedings of the 22nd International Conference on Multimedia Modeling. Miami, USA: Springer: 54-65[DOI: 10.1007/978-3-319-27674-8_6http://dx.doi.org/10.1007/978-3-319-27674-8_6]

Li C L, Liang X Y, Lu Y J, Zhao N and Tang J. 2019a. RGB-T object tracking: benchmark and baseline. Pattern Recognition, 96: #106977[DOI: 10.1016/j.patcog.2019.106977]

Li C L, Lin L, Zuo W M and Tang J. 2017a. Learning patch-based dynamic graph for visual tracking//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI: 4126-4132

Li C L, Liu L, Lu A D, Ji Q and Tang J. 2020. Challenge-aware RGBT tracking//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 222-237[DOI: 10.1007/978-3-030-58542-6_14http://dx.doi.org/10.1007/978-3-030-58542-6_14]

Li C L, Lu A D, Zheng A H, Tu Z Z and Tang J. 2019b. Multi-adapter RGBT tracking//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul, Korea (South):IEEE: 2262-2270[DOI: 10.1109/ICCVW.2019.00279http://dx.doi.org/10.1109/ICCVW.2019.00279]

Li C L, Sun X, Wang X, Zhang L and Tang J. 2017b. Grayscale-thermal object tracking via multitask Laplacian sparse representation. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 47(4): 673-681[DOI: 10.1109/TSMC.2016.2627052]

Li C L, Wu X H, Zhao N, Cao X C and Tang J. 2018b. Fusing two-stream convolutional neural networks for RGB-T object tracking. Neurocomputing, 281: 78-85[DOI: 10.1016/j.neucom.2017.11.068]

Li C L, Xiang Z Q, Tang J, Luo B and Wang F T. 2022. RGBT tracking via noise-robust cross-modal ranking. IEEE Transactions on Neural Networks and Learning Systems, 33(9): 5019-5031[DOI: 10.1109/TNNLS.2021.3067107]

Li C L, Zhao N, Lu Y J, Zhu C L and Tang J. 2017c. Weighted sparse representationregularized graph learning for RGB-T object tracking//Proceedings of the 25th ACM International Conference on Multimedia. Mountain View, USA: ACM: 1856-1864[DOI: 10.1145/3123266.3123289http://dx.doi.org/10.1145/3123266.3123289]

Li C L, Zhu C L, Huang Y, Tang J and Wang L. 2018c. Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 831-847[DOI: 10.1007/978-3-030-01261-8_49http://dx.doi.org/10.1007/978-3-030-01261-8_49]

Li C L, Zhu C L, Zhang J, Luo B, Wu X H and Tang J. 2019c. Learning local-global multi-graph descriptors for RGB-T object tracking. IEEE Transactions on Circuits and Systems for Video Technology, 29(10): 2913-2926[DOI: 10.1109/TCSVT.2018.2874312]

Li C L, Zhu C L, Zheng S F, Luo B and Tang J. 2018d. Two-stage modality-graphs regularized manifold ranking for RGB-T tracking. Signal Processing: Image Communication, 68: 207-217[DOI: 10.1016/j.image.2018.08.004]

Li L, Li C L, Tu Z Z and Tang J. 2018e. A fusion approach to grayscale-thermal tracking with cross-modal sparse representation//Proceedings of the 13th Conference on Image and Graphics Technologies and Applications. Beijing, China: Springer: 494-505[DOI: 10.1007/978-981-13-1702-6_49http://dx.doi.org/10.1007/978-981-13-1702-6_49]

Li W G, Hou Y E, Lou H D and Ye G Q. 2012. Robust visual tracking based on Gabor feature and sparse representation//Proceedings of 2012 IEEE International Conference on Robotics and Biomimetics. Guangzhou, China: IEEE: 1829-1835[DOI: 10.1109/ROBIO.2012.6491234http://dx.doi.org/10.1109/ROBIO.2012.6491234]

Li Z Y, Tao R, Gavves E, Snoek C G M and Smeulders A W M. 2017d. Tracking by natural language specification//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 7350-7358[DOI: 10.1109/CVPR.2017.777http://dx.doi.org/10.1109/CVPR.2017.777]

Liu B Y, Yang L, Huang J Z, Meer P, Gong L G and Kulikowski C. 2010. Robust and fast collaborative tracking with two stage sparse optimization//Proceedings of the 11th European Conference on Computer Vision. Heraklion, Greece: Springer: 624-637[DOI: 10.1007/978-3-642-15561-1_45http://dx.doi.org/10.1007/978-3-642-15561-1_45]

Liu H J, Moeys D P, Das G, Neil D, Liu S C and Delbrück T. 2016. Combined frame- and event-based detection and tracking//2016 IEEE International Symposium on Circuits and Systems. Montreal, Canada: IEEE: 2511-2514[DOI: 10.1109/ISCAS.2016.7539103http://dx.doi.org/10.1109/ISCAS.2016.7539103]

Liu H P and Sun F C. 2012. Fusion tracking in color and infrared images using joint sparse representation. Science China Information Sciences, 55(3): 590-599[DOI: 10.1007/s11432-011-4536-9]

Liu W C, Tang X A and Zhao C L. 2020. Robust RGBD tracking via weighted convolution operators. IEEE Sensors Journal, 20(8): 4496-4503[DOI: 10.1109/JSEN.2020.2964019]

Liu Y, Jing X Y, Nie J H, Gao H, Liu J and Jiang G P. 2019. Context-aware three-dimensional mean-shift with occlusion handling for robust object tracking in RGB-D videos. IEEE Transactions on Multimedia, 21(3): 664-677[DOI: 10.1109/TMM.2018.2863604]

Lu A D, Li C L, Yan Y Q, Tang J and Luo B. 2021. RGBT tracking via multi-adapter network with hierarchical divergence loss. IEEE Transactions on Image Processing, 30: 5613-5625[DOI: 10.1109/TIP.2021.3087341]

Lu A D, Qian C, Li C L, Tang J and Wang L. 2022. Duality-gated mutual condition network for RGBT tracking. IEEE Transactions on Neural Networks and Learning Systems: #3157594[DOI: 10.1109/TNNLS.2022.3157594]

Ma C, Huang J B, Yang X K and Yang M H. 2015. Hierarchical convolutional features for visual tracking//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3074-3082[DOI: 10.1109/ICCV.2015.352http://dx.doi.org/10.1109/ICCV.2015.352]

Ma Z A and Xiang Z Y. 2017. Robust object tracking with RGBD-based sparse learning. Frontiers of Information Technology and Electronic Engineering, 18(7): 989-1001[DOI: 10.1631/FITEE.1601338]

Mayer C, Danelljan M, Paudel D P and van Gool L. 2021. Learning target candidate association to keep track of what not to track//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 13424-13434[DOI: 10.1109/ICCV48922.2021.01319http://dx.doi.org/10.1109/ICCV48922.2021.01319]

Mei J T, Zhou D M, Cao J D, Nie R C and Guo Y B. 2021. HDINet: hierarchical dual-sensor interaction network for RGBT tracking. IEEE Sensors Journal, 21(15): 16915-16926[DOI: 10.1109/JSEN.2021.3078455]

Mei X and Ling H B. 2009. Robust visual tracking usingl1minimization//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan: IEEE: 1436-1443[DOI: 10.1109/ICCV.2009.5459292http://dx.doi.org/10.1109/ICCV.2009.5459292].

Meshgi K, Maeda S I, Oba S, Skibbe H, Li Y Z and Ishii S. 2016. An occlusion-aware particle filter tracker to handle complex and persistent occlusions. Computer Vision and Image Understanding, 150: 81-94[DOI: 10.1016/j.cviu.2016.05.011]

Nam H and Han B. 2016. Learning multi-domain convolutional neural networks for visual tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4293-4302[DOI: 10.1109/CVPR.2016.465http://dx.doi.org/10.1109/CVPR.2016.465]

Ning J F, Yang J M, Jiang S J, Zhang L and Yang M H. 2016. Object tracking via dual linear structured SVM and explicit feature map//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4266-4274[DOI: 10.1109/CVPR.2016.462http://dx.doi.org/10.1109/CVPR.2016.462]

Peng J C, Zhao H T, Hu Z W, Yi Z and Wang B F. 2021. Siamese infrared and visible light fusion network for RGB-T tracking[EB/OL]. [2022-06-02].https://arxiv.org/pdf/2103.07302.pdfhttps://arxiv.org/pdf/2103.07302.pdf

Qi Y K, Zhang S P, Qin L, Yao H X, Huang Q M, Lim J and Yang M H. 2016. Hedged deep tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4303-4311[DOI: 10.1109/CVPR.2016.466http://dx.doi.org/10.1109/CVPR.2016.466]

Qi Y K, Zhang S P, Zhang W G, Su L, Huang Q M and Yang M X. 2019. Learning attribute-specific representations for visual tracking//Proceedings of the 33rd AAAI Conference on Artificial Intelligence and the 31st Innovative Applications of Artificial Intelligence Conference and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence. Honolulu, USA: AAAI: 8835-8842[DOI: 10.1609/aaai.v33i01.33018835http://dx.doi.org/10.1609/aaai.v33i01.33018835]

Schnelle S R and Chan A L. 2011. Enhanced target tracking through infrared-visible image fusion//Proceedings of the 14th International Conference on Information Fusion. Chicago, USA: IEEE: 1-8

Shen L F, Wang X X, Liu L, Hou B, Jian Y L, Tang J and Luo B. 2022. RGBT tracking based on cooperative low-rank graph model. Neurocomputing, 492: 370-381[DOI: 10.1016/j.neucom.2022.04.032]

Shi H Z, Gao C X and Sang N. 2015. Using consistency of depth gradient to improve visual tracking in RGB-D sequences//Proceedings of 2015 Chinese Automation Congress. Wuhan, China: IEEE: 518-522[DOI: 10.1109/CAC.2015.7382555http://dx.doi.org/10.1109/CAC.2015.7382555]

Song S R and Xiao J X. 2013. Tracking revisited using RGBD camera: unified benchmark and baselines//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 233-240[DOI: 10.1109/ICCV.2013.36http://dx.doi.org/10.1109/ICCV.2013.36]

Song Y B, Ma C, Wu X H, Gong L J, Bao L C, Zuo W M, Shen C H, Lau R W H and Yang M H. 2018. VITAL: visual tracking via adversarial learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8990-8999[DOI: 10.1109/CVPR.2018.00937http://dx.doi.org/10.1109/CVPR.2018.00937]

Tedaldi D, Gallego G, Mueggler E and Scaramuzza D. 2016. Feature detection and tracking with the dynamic and active-pixel vision sensor (DAVIS)//Proceedings of the 2nd International Conference on Event-based Control, Communication, and Signal Processing. Krakow, Poland: IEEE: 1-7[DOI: 10.1109/EBCCSP.2016.7605086http://dx.doi.org/10.1109/EBCCSP.2016.7605086]

Tu Z Z, Lin C, Zhao W, Li C L and Tang J. 2022. M5L: multi-modal multi-margin metric learning for RGBT tracking. IEEE Transactions on Image Processing, 31: 85-98[DOI: 10.1109/TIP.2021.3125504]

Valmadre J, Bertinetto L, Henriques J, Vedaldi A and Torr P H S. 2017. End-to-end representation learning for correlation filter based tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5000-5008[DOI: 10.1109/CVPR.2017.531http://dx.doi.org/10.1109/CVPR.2017.531]

Wang C Q, Xu C Y, Cui Z, Zhou L, Zhang T, Zhang X Y and Yang J. 2020a. Cross-modal pattern-propagation for RGB-T tracking//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 7062-7071[DOI: 10.1109/CVPR42600.2020.00709http://dx.doi.org/10.1109/CVPR42600.2020.00709]

Wang L J, Ouyang W L, Wang X G and Lu H C. 2015. Visual tracking with fully convolutional networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3119-3127[DOI: 10.1109/ICCV.2015.357http://dx.doi.org/10.1109/ICCV.2015.357]

Wang N, Zhou W G, Wang J and Li H Q. 2021a. Transformer meets tracker: exploiting temporal context for robust visual tracking//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 1571-1580[DOI: 10.1109/CVPR46437.2021.00162http://dx.doi.org/10.1109/CVPR46437.2021.00162]

Wang Q, Fang J W and Yuan Y. 2014. Multi-cue based tracking. Neurocomputing, 131: 227-236[DOI: 10.1016/j.neucom.2013.10.021]

Wang X, Li J N, Zhu L, Zhang Z P, Chen Z, Li X, Wang Y W, Tian Y H and Wu F. 2022. VisEvent: reliable object tracking via collaboration of frame and event flows[EB/OL]. [2022-06-02].https://arxiv.org/pdf/2108.05015.pdfhttps://arxiv.org/pdf/2108.05015.pdf

Wang X, Shu X J, Zhang Z P, Jiang B, Wang Y W, Tian Y H and Wu F. 2021b. Towards more flexible and accurate object tracking with natural language: algorithms and benchmark//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 13758-13768[DOI: 10.1109/CVPR46437.2021.01355http://dx.doi.org/10.1109/CVPR46437.2021.01355]

Wang Y, Wei X, Shen H, Ding L and Wan J Q. 2020b. Robust fusion for RGB-D tracking using CNN features. Applied Soft Computing, 92: #106302[DOI: 10.1016/j.asoc.2020.106302]

Wang Y L, Li C L and Tang J. 2018b. Learning soft-consistent correlation filters for RGB-T object tracking//Proceedings of the 1st Chinese Conference on Pattern Recognition and Computer Vision. Guangzhou, China: Springer: 295-306[DOI: 10.1007/978-3-030-03341-5_25http://dx.doi.org/10.1007/978-3-030-03341-5_25]

Wu Y, Blasch E, Chen G S, Bai L and Ling H B. 2011. Multiple source data fusion via sparse representation for robust visual tracking//Proceedings of the 14th International Conference on Information Fusion. Chicago, USA: IEEE: 1-8

Xiao J J, Stolkin R, Gao Y Q and Leonardis A. 2018. Robust fusion of color and depth data for RGB-D target tracking using adaptive range-invariant depth models and spatio-temporal consistency constraints. IEEE Transactions on Cybernetics, 48(8): 2485-2499[DOI: 10.1109/TCYB.2017.2740952]

Xiao Y, Yang M M, Li C L, Liu L and Tang J. 2022. Attribute-based progressive fusion network for RGBT tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3): 2831-2838[DOI: 10.1609/aaai.v36i3.20187]

Xie Y J, Lu Y and Gu S. 2019. RGB-D object tracking with occlusion detection//Proceedings of the 15th International Conference on Computational Intelligence and Security. Macao, China: IEEE: 11-15[DOI: 10.1109/CIS.2019.00011http://dx.doi.org/10.1109/CIS.2019.00011]

Xu Q, Mei Y M, Liu J P and Li C L. 2022. Multimodal cross-layer bilinear pooling for RGBT tracking. IEEE Transactions on Multimedia, 24: 567-580[DOI: 10.1109/TMM.2021.3055362]

Xu Y D, Wang Z Y, Li Z X, Yuan Y and Yu G. 2020. SiamFC+ +: towards robust and accurate visual tracking with target estimation guidelines. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7): 12549-12556[DOI: 10.1609/aaai.v34i07.6944]

Yan B, Peng H W, Fu J L, Wang D and Lu H C. 2021a. Learning spatio-temporal transformer for visual tracking//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 10428-10437[DOI: 10.1109/ICCV48922.2021.01028http://dx.doi.org/10.1109/ICCV48922.2021.01028]

Yan S, Yang J Y, KäpyläJ, Zheng F, Leonardis A and Kämäräinen J K. 2021b. DepthTrack: unveiling the power of RGBD tracking//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 10705-10713[DOI: 10.1109/ICCV48922.2021.01055http://dx.doi.org/10.1109/ICCV48922.2021.01055]

Yang Z Y, Kumar T, Chen T L, Su J S and Luo J B. 2021. Grounding-tracking-integration. IEEE Transactions on Circuits and Systems for Video Technology, 31(9): 3433-3443[DOI: 10.1109/TCSVT.2020.3038720]

Yang Z Y, Wu Y J, Wang G R, Yang Y K, Li G Q, Deng L, Zhu J and Shi L P. 2019. DashNet: a hybrid artificial and spiking neural network for high-speed object tracking[EB/OL]. [2022-06-02].http://arxiv.org/pdf/1909.12942.pdfhttp://arxiv.org/pdf/1909.12942.pdf

Yun X, Sun Y J, Yang X X and Lu N N. 2019. Discriminative fusion correlation learning for visible and infrared tracking. Mathematical Problems in Engineering, 2019: #2437521[DOI: 10.1155/2019/2437521]

Zhai S L, Shao P P, Liang X Y and Wang X. 2019. Fast RGB-T tracking via cross-modal correlation filters. Neurocomputing, 334: 172-181[DOI: 10.1016/j.neucom.2019.01.022]

Zhai Y Y, Song P, Mou Z L, Chen X X and Liu X J. 2018. Occlusion-aware correlation particle filter target tracking based on RGBD data. IEEE Access, 6: 50752-50764[DOI: 10.1109/ACCESS.2018.2869766]

Zhang H, Zhang L, Zhuo L and Zhang J. 2020a. Object tracking in RGB-T videos using modal-aware attention network and competitive learning. Sensors, 20(2): #393[DOI: 10.3390/s20020393]

Zhang J Q, Yang X, Fu Y K, Wei X P, Yin B C and Dong B. 2021a. Object tracking by jointly exploiting frame and event domain//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 13023-13032[DOI: 10.1109/ICCV48922.2021.01280http://dx.doi.org/10.1109/ICCV48922.2021.01280]

Zhang K H, Zhang L and Yang M H. 2012. Real-time compressive tracking//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer: 864-877[DOI: 10.1007/978-3-642-33712-3_62http://dx.doi.org/10.1007/978-3-642-33712-3_62]

Zhang L C, Danelljan M, Gonzalez-Garcia A, van de Weijer J and Khan F S. 2019a. Multi-modal fusion for end-to-end RGB-T tracking//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul, Korea (South): IEEE: 2252-2261[DOI: 10.1109/ICCVW.2019.00278http://dx.doi.org/10.1109/ICCVW.2019.00278]

Zhang P Y, Wang D and Lu H C. 2020b. Multi-modal visual tracking: review and experimental comparison[EB/OL]. [2022-06-02].https://arxiv.org/pdf/2012.04176.pdfhttps://arxiv.org/pdf/2012.04176.pdf

Zhang P Y, Wang D, Lu H C and Yang X Y. 2021b. Learning adaptive attribute-driven representation for real-time RGB-T tracking. International Journal of Computer Vision, 129(9): 2714-2729[DOI: 10.1007/s11263-021-01495-3]

Zhang P Y, Zhao J, Bo C J, Wang D, Lu H C and Yang X Y. 2021c. Jointly modeling motion and appearance cues for robust RGB-T tracking. IEEE Transactions on Image Processing, 30: 3335-3347[DOI: 10.1109/TIP.2021.3060862]

Zhang P Y, Zhao J, Wang D, Lu H C, and Ruan X. 2022a. Visible-thermal UAV tracking: a large-scale benchmark and new baseline//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 8876-8885[DOI: 10.1109/CVPR52688.2022.00868http://dx.doi.org/10.1109/CVPR52688.2022.00868]

Zhang T L, Liu X R, Zhang Q and Han J G. 2022b. SiamCDA: complementarity- and distractor-aware RGB-T tracking based on Siamese network. IEEE Transactions on Circuits and Systems for Video Technology, 32(3): 1403-1417[DOI: 10.1109/TCSVT.2021.3072207]

Zhang X C, Ye P, Leung H, Gong K and Xiao G. 2020d. Object fusion tracking based on visible and infrared images: a comprehensive review. Information Fusion, 63: 166-187[DOI: 10.1016/j.inffus.2020.05.002]

Zhang X C, Ye P, Peng S Y, Liu J and Xiao G. 2020e. DSiamMFT: an RGB-T fusion tracking method via dynamic Siamese networks using multi-layer feature fusion. Signal Processing: Image Communication, 84: #115756[DOI: 10.1016/j.image.2019.115756]

Zhang X C, Ye P, Peng S Y, Liu J, Gong K and Xiao G. 2019b. SiamFT: an RGB-infrared fusion tracking method via fully convolutional siamese networks. IEEE Access, 7: 122122-122133[DOI: 10.1109/ACCESS.2019.2936914]

Zhang Y L, Qian X Y, Zhang M and Ge H J. 2020. Correlation filter target tracking algorithm based on adaptive multifeature fusion. Journal of Image and Graphics, 25(6): 1160-1170

张艳琳, 钱小燕, 张淼, 葛红娟. 2020. 自适应多特征融合相关滤波目标跟踪. 中国图象图形学报, 25(6): 1160-1170[DOI: 10.1016/j.inffus.2020.05.002]

Zhang Z P and Peng H W. 2019. Deeper and wider siamese networks for real-time visual tracking//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4586-4595[DOI: 10.1109/CVPR.2019.00472http://dx.doi.org/10.1109/CVPR.2019.00472]

Zhao P Y, Liu Q L, Wang W and Guo Q. 2020. TSDM: tracking by SiamRPN+ + with a depth-refiner and a mask-generator//Proceedingsof the 25th International Conference on Pattern Recognition. Milan, Italy: IEEE: 670-676[DOI: 10.1109/ICPR48806.2021.9413315http://dx.doi.org/10.1109/ICPR48806.2021.9413315]

Zhu Y B, Li C L, Lu Y J, Lin L, Luo B and Tang J. 2019a. FANet: quality-aware feature aggregation network for RGB-T tracking[EB/OL]. [2022-06-02].https://arxiv.org/pdf/1811.09855v1.pdfhttps://arxiv.org/pdf/1811.09855v1.pdf

Zhu Y B, Li C L, Luo B, Tang J and Wang X. 2019b. Dense feature aggregation and pruning for RGBT tracking//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France: ACM: 465-472[DOI: 10.1145/3343031.3350928http://dx.doi.org/10.1145/3343031.3350928]

Zhu Y B, Li C L, Tang J, Luo B and Wang L. 2022. RGBT tracking by trident fusion network. IEEE Transactions on Circuits and Systems for Video Technology, 32(2): 579-592[DOI: 10.1109/TCSVT.2021.3067997]

文章被引用时，请邮件提醒。

提交

多层次融合注意力网络的双目图像超分辨率重建

视觉—惯性导航定位技术研究进展

结合颜色属性的分层结构直方图

背景与时间感知的相关滤波实时视觉跟踪

基于信息融合的工业视频图象压缩编码方法研究