单目标跟踪中的视觉智能评估技术综述

胡世宇; 赵鑫; 黄凯奇

doi:10.11834/jig.230498

综述 | 浏览量 : 0 下载量: 768 CSCD: 0

PDF
导出
分享
收藏
专辑

单目标跟踪中的视觉智能评估技术综述
Visual intelligence evaluation techniques for single object tracking： a survey
2024年29卷第8期页码：2269-2302
收稿日期：2023-07-10，

修回日期：2023-10-07，

纸质出版日期：2024-08-16
DOI： 10.11834/jig.230498
稿件说明：

移动端阅览

胡世宇，赵鑫，黄凯奇. 2024. 单目标跟踪中的视觉智能评估技术综述. 中国图象图形学报， 29(08):2269-2302 DOI： 10.11834/jig.230498.

Hu Shiyu， Zhao Xin， Huang Kaiqi. 2024. Visual intelligence evaluation techniques for single object tracking： a survey. Journal of Image and Graphics， 29(08):2269-2302 DOI： 10.11834/jig.230498.

摘要

单目标跟踪任务旨在对人类动态视觉系统进行建模，让机器在复杂环境中具备类人的运动目标跟踪能力，并已广泛应用于无人驾驶、视频监控、机器人视觉等领域。研究者从算法设计的角度开展了大量工作，并在代表性数据集中表现出良好性能。然而，在面临如目标形变、快速运动、光照变化等挑战因素时，现有算法的跟踪效果和人类预期相比还存在着较大差距，揭示了当前的评测技术发展仍存在滞后性和局限性。综上，区别于以算法设计为核心的传统综述思路，本文依托单目标跟踪任务、从视觉智能评估技术出发，对评测流程中涉及的各个关键性环节（评测任务、评测环境、待测对象和评估机制）进行系统梳理。首先，对单目标跟踪任务的发展历程和挑战因素进行介绍，并详细对比了评估所需的评测环境（数据集、竞赛等）。其次，对单目标跟踪待测对象进行介绍，不仅包含以相关滤波和孪生神经网络为代表的跟踪算法，同时也涉及跨学科领域开展的人类视觉跟踪实验。最后，从“机机对抗”和“人机对抗”两个角度对单目标跟踪评估机制进行回顾，并对当前待测对象的目标跟踪能力进行分析和总结。在此基础上，对单目标跟踪智能评估的发展趋势进行总结和展望，进一步分析未来研究中存在的挑战因素，并探讨了下一步可能的研究方向。

Abstract

Single object tracking （SOT） task， which aims to model the human dynamic vision system and accomplish human-like object tracking ability in complex environments， has been widely used in various real-world applications like self-driving， video surveillance， and robot vision. Over the past decade， the development in deep learning has encouraged many research groups to work on designing different tracking frameworks like correlation filter （CF） and Siamese neural networks （SNNs）， which facilitate the progress of SOT research. However， many factors （e.g.， target deformation， fast motion， and illumination changes） in natural application scenes still challenge the SOT trackers. Thus， algorithms with novel architectures have been proposed for robust tracking and to achieve better performance in representative experimental environments. However， several poor cases in natural application environments reveal a large gap between the performance of state-of-the-art trackers and human expectations， which motivates us to pay close attention to the evaluation aspects. Therefore， instead of the traditional reviews that mainly concentrate on algorithm design， this study systematically reviews the visual intelligence evaluation techniques for SOT， including four key aspects： the task definition， evaluation environments， task executors， and evaluation mechanisms. First， we present the development direction of task definition， which includes the original short-term tracking， long-term tracking， and the recently proposed global instance tracking. With the evolution of the SOT definition， research has shown a progress from perceptual to cognitive intelligence. We also summarize challenging factors in the SOT task to help readers understand the research bottlenecks in actual applications. Second， we compare the representative experimental environments in SOT evaluation. Unlike existing reviews that mainly introduce datasets based on chronological order， this study divides the environments into three categories （i.e.， general datasets， dedicated datasets， and competition datasets） and introduces them separately. Third， we introduce the executors of SOT tasks， which not only include tracking algorithms represented by traditional trackers， CF-based trackers， SNN-based trackers， and Transformer-based trackers but also contain human visual tracking experiments conducted in interdisciplinary fields. To our knowledge， none of the existing SOT reviews have included related works on human dynamic visual ability. Therefore， introducing interdisciplinary works can also support the visual intelligence evaluation by comparing machines with humans and better reveal the intelligence degree of existing algorithm modeling methods. Fourth， we review the evaluation mechanism and metrics， which encompass traditional machine–machine and novel human–machine comparisons， and analyze the target tracking capability of various task executors. We also provide an overview of the human–machine comparison named visual Turing test， including its application in many vision tasks （e.g.， image comprehension， game navigation， image classification， and image recognition）. Especially， we hope that this study can help researchers focus on this novel evaluation technique， better understand the capability bottlenecks， further explore the gaps between existing methods and humans， and finally achieve the goal of algorithmic intelligence. Finally， we indicate the evolution trend of visual intelligence evaluation techniques： 1） designing more human-like task definitions， 2） constructing more comprehensive and realistic evaluation environments， 3） including human subjects as task executors， and 4） using human abilities as a baseline to evaluate machine intelligence. In conclusion， this study summarizes the evolution trend of visual intelligence evaluation techniques for SOT task， further analyzes the existing challenge factors， and discusses the possible future research directions.

关键词

Keywords

references

Bao C L ， Wu Y ， Ling H B and Ji H . 2012 . Real time robust L1 tracker using accelerated proximal gradient approach // Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition . Providence， USA ： IEEE： 1830 - 1837 ［ DOI： 10.1109/CVPR.2012.6247881 http://dx.doi.org/10.1109/CVPR.2012.6247881 ］

Bertinetto L ， Valmadre J ， Henriques J F ， Vedaldi A and Torr P H S . 2016 . Fully-convolutional siamese networks for object tracking // Proceedings of 2016 European Conference on Computer Vision . Amsterdam， the Netherlands ： Springer： 850 - 865 ［ DOI： 10.1007/978-3-319-48881-3_56 http://dx.doi.org/10.1007/978-3-319-48881-3_56 ］

Bhat G ， Danelljan M ， van Gool L and Timofte R . 2019 . Learning discriminative model prediction for tracking // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 6181 - 6190 ［ DOI： 10.1109/ICCV.2019.00628 http://dx.doi.org/10.1109/ICCV.2019.00628 ］

Bhat G ， Danelljan M ， van Gool L and Timofte R . 2020 . Know your surroundings： exploiting scene information for object tracking // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 205 - 221 ［ DOI： 10.1007/978-3-030-58592-1_13 http://dx.doi.org/10.1007/978-3-030-58592-1_13 ］

Bhat G ， Johnander J ， Danelljan M ， Khan F S and Felsberg M . 2018 . Unveiling the power of deep tracking // Proceedings of the 15th European Conference on Computer Vision . Munich， Germany ： Springer： 493 - 509 ［ DOI： 10.1007/978-3-030-01216-8_30 http://dx.doi.org/10.1007/978-3-030-01216-8_30 ］

Biederman I . 1987 . Recognition-by-components： a theory of human image understanding . Psychological Review ， 94 （ 2 ）： 115 - 147 ［ DOI： 10.1037/0033-295X.94.2.115 http://dx.doi.org/10.1037/0033-295X.94.2.115 ］

Bolme D S ， Beveridge J R ， Draper B A and Lui Y M . 2010 . Visual object tracking using adaptive correlation filters // Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition . San Francisco， USA ： IEEE： 2544 - 2550 ［ DOI： 10.1109/CVPR.2010.5539960 http://dx.doi.org/10.1109/CVPR.2010.5539960 ］

Bromley J ， Guyon I ， LeCun Y ， Säckinger E and Shah R . 1993 . Signature verification using a “siamese” time delay neural network // Proceedings of the 6th International Conference on Neural Information Processing Systems . Denver， Colorado， USA ： Morgan Kaufmann Publishers Inc.： 737 - 744

Brown N and Sandholm T . 2018 . Superhuman AI for heads-up no-limit poker： Libratus beats top professionals . Science ， 359 （ 6374 ）： 418 - 424 ［ DOI： 10.1126/science.aao1733 http://dx.doi.org/10.1126/science.aao1733 ］

Burg A and Hulbert S . 1961 . Dynamic visual acuity as related to age， sex， and static acuity . Journal of Applied Psychology ， 45 （ 2 ）： 111 - 116

Čehovin L ， Leonardis A and Kristan M . 2016 . Visual object tracking performance measures revisited . IEEE Transactions on Image Processing ， 25 （ 3 ）： 1261 - 1274 ［ DOI： 10.1109/TIP.2016.2520370 http://dx.doi.org/10.1109/TIP.2016.2520370 ］

Chen L . 1982 . Topological structure in visual perception . Science ， 218 （ 4573 ）： 699 - 700 ［ DOI： 10.1126/science.7134969 http://dx.doi.org/10.1126/science.7134969 ］

Chen X ， Yan B ， Zhu J W ， Wang D ， Yang X Y and Lu H C . 2021 . Transformer tracking // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： 8122 - 8131 ［ DOI： 10.1109/CVPR46437.2021.00803 http://dx.doi.org/10.1109/CVPR46437.2021.00803 ］

Coulom R . 2007 . Computing “Elo ratings” of move patterns in the game of Go . ICGA Journal ， 30 （ 4 ）： 198 - 208 ［ DOI： 10.3233/ICG-2007-30403 http://dx.doi.org/10.3233/ICG-2007-30403 ］

Cui Y T ， Jiang C ， Wang L M and Wu G S . 2021 . Target transformed regression for accurate tracking ［EB/OL］. ［ 2023-03-14 ］. https://arxiv.org/pdf/2104.00403.pdf https://arxiv.org/pdf/2104.00403.pdf

Cui Y T ， Jiang C ， Wang L M and Wu G S . 2022 . MixFormer： end-to-end tracking with iterative mixed attention // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， USA ： IEEE： 13598 - 13608 ［ DOI： 10.1109/CVPR52688.2022.01324 http://dx.doi.org/10.1109/CVPR52688.2022.01324 ］

Dai K N ， Zhang Y H ， Wang D ， Li J H ， Lu H C and Yang X Y . 2020 . High-performance long-term tracking with meta-updater // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 6297 - 6306 ［ DOI： 10.1109/CVPR42600.2020.00633 http://dx.doi.org/10.1109/CVPR42600.2020.00633 ］

Danelljan M ， Bhat G ， Khan F S and Felsberg M . 2017 . ECO： efficient convolution operators for tracking // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 6931 - 6939 ［ DOI： 10.1109/CVPR.2017.733 http://dx.doi.org/10.1109/CVPR.2017.733 ］

Danelljan M ， Bhat G ， Khan F S and Felsberg M . 2019 . ATOM： accurate tracking by overlap maximization // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 4655 - 4664 ［ DOI： 10.1109/CVPR.2019.00479 http://dx.doi.org/10.1109/CVPR.2019.00479 ］

Danelljan M ， Häger G ， Khan F S and Felsberg M . 2015 . Convolutional features for correlation filter based visual tracking // Proceedings of 2015 IEEE International Conference on Computer Vision Workshop . Santiago， Chile ： IEEE： 621 - 629 ［ DOI： 10.1109/ICCVW.2015.84 http://dx.doi.org/10.1109/ICCVW.2015.84 ］

Danelljan M ， Robinson A ， Khan F S and Felsberg M . 2016 . Beyond correlation filters： learning continuous convolution operators for visual tracking // Proceedings of the 14th European Conference on Computer Vision . Amsterdam， the Netherlands ： Springer： 472 - 488 ［ DOI： 10.1007/978-3-319-46454-1_29 http://dx.doi.org/10.1007/978-3-319-46454-1_29 ］

Danelljan M ， van Gool L and Timofte R . 2020 . Probabilistic regression for visual tracking // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 7181 - 7190 ［ DOI： 10.1109/CVPR42600.2020.00721 http://dx.doi.org/10.1109/CVPR42600.2020.00721 ］

Devlin S ， Georgescu R ， Momennejad I ， Rzepecki J ， Zuniga E ， Costello G ， Leroy G ， Shaw A and Hofmann K . 2021 . Navigation Turing test （NTT）： learning to evaluate human-like navigation // Proceedings of the 38th International Conference on Machine Learning . Virtual ： PMLR： 2644 - 2653

Erickson G B ， Citek K ， Cove M ， Wilczek J ， Linster C ， Bjarnason B and Langemo N . 2011 . Reliability of a computer-based system for measuring visual performance skills . Optometry——Journal of the American Optometric Association ， 82 （ 9 ）： 528 - 542 ［ DOI： 10.1016/j.optm.2011.01.012 http://dx.doi.org/10.1016/j.optm.2011.01.012 ］

Fan H ， Bai H X ， Lin L T ， Yang F ， Chu P ， Deng G ， Yu S J ， Harshit ， Huang M Z ， Liu J H ， Xu Y ， Liao C Y ， Yuan L and Ling H B . 2021a . LaSOT： a high-quality large-scale single object tracking benchmark . International Journal of Computer Vision ， 129 （ 2 ）： 439 - 461 ［ DOI： 10.1007/s11263-020-01387-y http://dx.doi.org/10.1007/s11263-020-01387-y ］

Fan H ， Miththanthaya H A ， Harshit H ， Rajan S R ， Liu X Q ， Zou Z L ， Lin Y W and Ling H B . 2021b . Transparent object tracking benchmark // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada ： IEEE： 10714 - 10723 ［ DOI： 10.1109/ICCV48922.2021.01056 http://dx.doi.org/10.1109/ICCV48922.2021.01056 ］

Fan H ， Yang F ， Chu P ， Lin Y W ， Yuan L and Ling H B . 2021c . TracKlinic： diagnosis of challenge factors in visual tracking // Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision . Waikoloa， USA ： IEEE： 969 - 978 ［ DOI： 10.1109/WACV48630.2021.00101 http://dx.doi.org/10.1109/WACV48630.2021.00101 ］

Geirhos R ， Jacobsen J H ， Michaelis C ， Zemel R ， Brendel W ， Bethge M and Wichmann F A . 2020a . Shortcut learning in deep neural networks . Nature Machine Intelligence ， 2 （ 11 ）： 665 - 673 ［ DOI： 10.1038/s42256-020-00257-z http://dx.doi.org/10.1038/s42256-020-00257-z ］

Geirhos R ， Meding K and Wichmann F A . 2020b . Beyond accuracy： quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency // Proceedings of the 34th International Conference on Neural Information Processing Systems . Vancouver， Canada ： Curran Associates Inc.： 13890 - 13902

Geirhos R ， Narayanappa K ， Mitzkus B ， Thieringer T ， Bethge M ， Wichmann F A and Brendel W . 2021 . Partial success in closing the gap between human and machine vision ［EB/OL］. ［ 2023-07-10 ］. http://arxiv.org/pdf/2106.07411.pdf http://arxiv.org/pdf/2106.07411.pdf

Geirhos R ， Temme C R M ， Rauber J ， Schütt H H ， Bethge M and Wichmann F A . 2018 . Generalisation in humans and deep neural networks // Proceedings of the 32nd International Conference on Neural Information Processing Systems . Montréal， Canada ： Curran Associates Inc.： 7549 - 7561

Geman D ， Geman S ， Hallonquist N and Younes L . 2015 . Visual Turing test for computer vision systems . Proceedings of the National Academy of Sciences of the United States of America ， 112 （ 12 ）： 3618 - 3623 ［ DOI： 10.1073/pnas.1422953112 http://dx.doi.org/10.1073/pnas.1422953112 ］

Ginsburg A P . 1984 . A new contrast sensitivity vision test chart . Optometry and Vision Science ， 61 （ 6 ）： 403 - 407 ［ DOI： 10.1097/00006324-198406000-00011 http://dx.doi.org/10.1097/00006324-198406000-00011 ］

Girshick R . 2015 . Fast R-CNN // Proceedings of 2015 IEEE International Conference on Computer Vision . Santiago， Chile ： IEEE： 1440 - 1448 ［ DOI： 10.1109/ICCV.2015.169 http://dx.doi.org/10.1109/ICCV.2015.169 ］

Guo D Y ， Wang J ， Cui Y ， Wang Z H and Chen S Y . 2020 . SiamCAR： siamese fully convolutional classification and regression for visual tracking // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 6268 - 6276 ［ DOI： 10.1109/CVPR42600.2020.00630 http://dx.doi.org/10.1109/CVPR42600.2020.00630 ］

Han R Z ， Feng W ， Guo Q and Hu Q H . 2022 . Single object tracking research： a survey . Chinese Journal of Computers ， 45 （ 9 ）： 1877 - 1907

韩瑞泽，冯伟，郭青，胡清华 . 2022 . 视频单目标跟踪研究进展综述 . 计算机学报， 45 （ 9 ）： 1877 - 1907 ［ DOI： 10.11897/SP.J.1016.2022.01877 http://dx.doi.org/10.11897/SP.J.1016.2022.01877 ］

Hare S ， Golodetz S ， Saffari A ， Vineet V ， Cheng M M ， Hicks S L and Torr P H S . 2016 . Struck： structured output tracking with kernels . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 38 （ 10 ）： 2096 - 2109 ［ DOI： 10.1109/TPAMI.2015.2509974 http://dx.doi.org/10.1109/TPAMI.2015.2509974 ］

He K M ， Zhang X Y ， Ren S Q and Sun J . 2016 . Deep residual learning for image recognition // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， USA ： IEEE： 770 - 778 ［ DOI： 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ］

He S F ， Lau R W H ， Yang Q X ， Wang J and Yang M H . 2017 . Robust object tracking via locality sensitive histograms . IEEE Transactions on Circuits and Systems for Video Technology ， 27 （ 5 ）： 1006 - 1017 ［ DOI： 10.1109/TCSVT.2016.2527300 http://dx.doi.org/10.1109/TCSVT.2016.2527300 ］

He S F ， Yang Q X ， Lau R W H ， Wang J and Yang M H . 2013 . Visual tracking via locality sensitive histograms // Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition . Portland， USA ： IEEE： 2427 - 2434 ［ DOI： 10.1109/CVPR.2013.314 http://dx.doi.org/10.1109/CVPR.2013.314 ］

Henriques J F ， Caseiro R ， Martins P and Batista J . 2012 . Exploiting the circulant structure of tracking-by-detection with kernels // Proceedings of the 12th European Conference on Computer Vision . Florence， Italy ： Springer： 702 - 715 ［ DOI： 10.1007/978-3-642-33765-9_50 http://dx.doi.org/10.1007/978-3-642-33765-9_50 ］

Henriques J F ， Caseiro R ， Martins P and Batista J . 2015 . High-speed tracking with kernelized correlation filters . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 37 （ 3 ）： 583 - 596 ［ DOI： 10.1109/tpami.2014.2345390 http://dx.doi.org/10.1109/tpami.2014.2345390 ］

Hu S Y ， Zhao X and Huang K Q . 2024 . SOTVerse： a user-defined task space of single object tracking . International Journal of Computer Vision ， 132 （ 3 ）： 872 - 930 ［ DOI： 10.1007/s11263-023-01908-5 http://dx.doi.org/10.1007/s11263-023-01908-5 ］

Hu S Y ， Zhao X ， Huang L H and Huang K Q . 2023 . Global instance tracking： locating target more like humans . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 45 （ 1 ）： 576 - 592 ［ DOI： 10.1109/TPAMI.2022.3153312 http://dx.doi.org/10.1109/TPAMI.2022.3153312 ］

Huang K Q ， Chen X T ， Kang Y F and Tan T N . 2015 . Intelligent visual surveillance： a review . Chinese Journal of Computers ， 38 （ 6 ）： 1093 - 1118

黄凯奇，陈晓棠，康运锋，谭铁牛 . 2015 . 智能视频监控技术综述 . 计算机学报， 38 （ 6 ）： 1093 - 1118 ［ DOI： 10.11897/SP.J.1016.2015.01093 http://dx.doi.org/10.11897/SP.J.1016.2015.01093 ］

Huang K Q ， Xing J L ， Zhang J G ， Ni W C and Xu B . 2020 . Intelligent technologies of human-computer gaming . Scientia Sinica Informationis ， 50 （ 4 ）： 540 - 550

黄凯奇，兴军亮，张俊格，倪晚成，徐博 . 2020 . 人机对抗智能技术 . 中国科学：信息科学）， 50 （ 4 ）： 540 - 550 ［ DOI： 10.1360/N112019-00048 http://dx.doi.org/10.1360/N112019-00048 ］

Huang K Q ， Zhao X ， Li Q Z and Hu S Y . 2021 . Visual Turing： the next development of computer vision in the view of human-computer gaming . Journal of Graphics ， 42 （ 3 ）： 339 - 348

黄凯奇，赵鑫，李乔哲，胡世宇 . 2021 . 视觉图灵：从人机对抗看计算机视觉下一步发展 . 图学学报， 42 （ 3 ）： 339 - 348 ［ DOI： 10.11996/JG.j.2095-302X.2021030339 http://dx.doi.org/10.11996/JG.j.2095-302X.2021030339 ］

Huang L H and Ma B . 2015 . Tensor pooling for online visual tracking // Proceedings of 2015 IEEE International Conference on Multimedia and Expo . Turin， Italy ： IEEE： #7177452 ［ DOI： 10.1109/ICME.2015.7177452 http://dx.doi.org/10.1109/ICME.2015.7177452 ］

Huang L H ， Zhao X and Huang K Q . 2019 . Bridging the gap between detection and tracking： a unified approach // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 3998 - 4008 ［ DOI： 10.1109/ICCV.2019.00410 http://dx.doi.org/10.1109/ICCV.2019.00410 ］

Huang L H ， Zhao X and Huang K Q . 2020 . GlobalTrack： a simple and strong baseline for long-term tracking // Proceedings of the 34th AAAI Conference on Artificial Intelligence . New York， USA ： AAAI Press： 11037 - 11044 ［ DOI： 10.1609/aaai.v34i07.6758 http://dx.doi.org/10.1609/aaai.v34i07.6758 ］

Huang L H ， Zhao X and Huang K Q . 2021 . Got-10k： a large high-diversity benchmark for generic object tracking in the wild . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 43 （ 5 ）： 1562 - 1577 ［ DOI： 10.1109/TPAMI.2019.2957464 http://dx.doi.org/10.1109/TPAMI.2019.2957464 ］

Hubel D H and Wiesel T N . 1959 . Receptive fields of single neurones in the cat’s striate cortex . The Journal of Physiology ， 148 （ 3 ）： 574 - 591 ［ DOI： 10.1113/jphysiol.1959.sp006308 http://dx.doi.org/10.1113/jphysiol.1959.sp006308 ］

Hubel D H and Wiesel T N . 1962 . Receptive fields， binocular interaction and functional architecture in the cat’s visual cortex . The Journal of Physiology ， 160 （ 1 ）： 106 - 154 ［ DOI： 10.1113/jphysiol.1962.sp006837 http://dx.doi.org/10.1113/jphysiol.1962.sp006837 ］

Hyvärinen L ， Walthes R ， Jacob N ， Chaplin K N and Leonhardt M . 2014 . Current understanding of what infants see . Current Ophthalmology Reports ， 2 （ 4 ）： 142 - 149 ［ DOI： 10.1007/s40135-014-0056-2 http://dx.doi.org/10.1007/s40135-014-0056-2 ］

Javed S ， Danelljan M ， Khan F S ， Khan M H ， Felsberg M and Matas J . 2023 . Visual object tracking with discriminative filters and siamese networks： a survey and outlook . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 45 （ 5 ）： 6552 - 6574 ［ DOI： 10.1109/TPAMI.2022.3212594 http://dx.doi.org/10.1109/TPAMI.2022.3212594 ］

Kalal Z ， Mikolajczyk K and Matas J . 2012 . Tracking-learning-detection . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 34 （ 7 ）： 1409 - 1422 ［ DOI： 10.1109/TPAMI.2011.239 http://dx.doi.org/10.1109/TPAMI.2011.239 ］

Kirshner A . 1967 . Dynamic acuity a quantiative measure of eye movements . Journal of the American Optometric Association ， 38 （ 6 ）： 460 - 462

Kristan M ， Matas J ， Leonardis A ， Vojir T ， Pflugfelder R ， Fern􀅡ndez G ， Nebehay G ， Porikli F and Cehovin L . 2016 . A novel performance evaluation methodology for single-target trackers . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 38 （ 11 ）： 2137 - 2155 ［ DOI： 10.1109/TPAMI.2016.2516982 http://dx.doi.org/10.1109/TPAMI.2016.2516982 ］

Kristan M ， Pflugfelder R ， Leonardis A ， Matas J ， Porikli F ， Cehovin L ， Nebehay G ， Fernandez G ， Vojir T ， Gatt A ， Khajenezhad A ， Salahledin A ， Soltani-Farani A ， Zarezade A ， Petrosino A ， Milton A ， Bozorgtabar B ， Li B ， Chan C S ， Heng C ， Ward D ， Kearney D ， Monekosso D ， Karaimer H C ， Rabiee H R ， Zhu J K ， Gao J ， Xiao J J ， Zhang J G ， Xing J L ， Huang K Q ， Lebeda K ， Cao L J ， Maresca M E ， Lim M K ， El Helw M ， Felsberg M ， Remagnino P ， Bowden R ， Goecke R ， Stolkin R ， Lim S Y ， Maher S ， Poullot S ， Wong S ， Satoh S ， Chen W H ， Hu W M ， Zhang X Q ， Li Y and Niu Z H . 2013 . The visual object tracking VOT2013 challenge results // Proceedings of 2013 IEEE International Conference on Computer Vision Workshops . Sydney， Australia ： IEEE： 98 - 111 ［ DOI： 10.1109/ICCVW.2013.20 http://dx.doi.org/10.1109/ICCVW.2013.20 ］

Lake B M ， Salakhutdinov R and Tenenbaum J B . 2015 . Human-level concept learning through probabilistic program induction . Science ， 350 （ 6266 ）： 1332 - 1338 ［ DOI： 10.1126/science.aab3050 http://dx.doi.org/10.1126/science.aab3050 ］

Land M F and McLeod P . 2000 . From eye movements to actions： how batsmen hit the ball . Nature Neuroscience ， 3 （ 12 ）： 1340 - 1345 ［ DOI： 10.1038/81887 http://dx.doi.org/10.1038/81887 ］

Langlois T A ， Zhao H C ， Grant E ， Dasgupta I ， Griffiths T L and Jacoby N . 2021 . Passive attention in artificial neural networks predicts human visual selectivity // Proceedings of the 35th Conference on Neural Information Processing Systems . Virtual ： Curran Associates Inc.： 27094 - 27106

Lazebnik S ， Schmid C and Ponce J . 2006 . Beyond bags of features： spatial pyramid matching for recognizing natural scene categories // Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition . New York， USA ： IEEE： 2169 - 2178 ［ DOI： 10.1109/CVPR.2006.68 http://dx.doi.org/10.1109/CVPR.2006.68 ］

Li A N ， Lin M ， Wu Y ， Yang M H and Yan S C . 2016 . NUS-PRO： a new visual tracking challenge . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 38 （ 2 ）： 335 - 349 ［ DOI： 10.1109/TPAMI.2015.2417577 http://dx.doi.org/10.1109/TPAMI.2015.2417577 ］

Li B ， Wu W ， Wang Q ， Zhang F Y ， Xing J L and Yan J J . 2019 . SiamRPN++： evolution of siamese visual tracking with very deep networks // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 4277 - 4286 ［ DOI： 10.1109/CVPR.2019.00441 http://dx.doi.org/10.1109/CVPR.2019.00441 ］

Li B ， Yan J J ， Wu W ， Zhu Z and Hu X L . 2018 . High performance visual tracking with siamese region proposal network // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 8971 - 8980 ［ DOI： 10.1109/CVPR.2018.00935 http://dx.doi.org/10.1109/CVPR.2018.00935 ］

Li C L ， Lu A D ， Liu L and Tang J . 2023 . Multi-modal visual tracking： a survey . Journal of Image and Graphics ， 28 （ 1 ）： 37 - 56

李成龙，鹿安东，刘磊，汤进 . 2023 . 多模态视觉跟踪方法综述 . 中国图象图形学报， 28 （ 1 ）： 37 - 56 ［ DOI： 10.11834/jig.220578 http://dx.doi.org/10.11834/jig.220578 ］

Li F F ， Fergus R and Perona P . 2006 . One-shot learning of object categories . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 28 （ 4 ）： 594 - 611 ［ DOI： 10.1109/TPAMI.2006.79 http://dx.doi.org/10.1109/TPAMI.2006.79 ］

Li S Y and Yeung D Y . 2017 . Visual object tracking for unmanned aerial vehicles： a benchmark and new motion models // Proceedings of the 31st AAAI Conference on Artificial Intelligence . San Francisco， USA ： AAAI Press： 4140 - 4146 ［ DOI： 10.1609/aaai.v31i1.11205 http://dx.doi.org/10.1609/aaai.v31i1.11205 ］

Li X ， Zha Y F ， Zhang T Z ， Cui Z ， Zuo W M ， Hou Z Q ， Lu H C and Wang H Z . 2019 . Survey of visual object tracking algorithms based on deep learning . Journal of Image and Graphics ， 24 （ 12 ）： 2057 - 2080

李玺，查宇飞，张天柱，崔振，左旺孟，侯志强，卢湖川，王菡子 . 2019 . 深度学习的目标跟踪算法综述 . 中国图象图形学报， 24 （ 12 ）： 2057 - 2080 ［ DOI： 0.11834/jig http://dx.doi.org/0.11834/jig ， 190372 ］

Liang P P ， Blasch E and Ling H B . 2015 . Encoding color information for visual tracking： algorithms and benchmark . IEEE Transactions on Image Processing ， 24 （ 12 ）： 5630 - 5644 ［ DOI： 10.1109/TIP.2015.2482905 http://dx.doi.org/10.1109/TIP.2015.2482905 ］

Liang W X ， Tadesse G A ， Ho D ， Li F F ， Zaharia M ， Zhang C and Zou J . 2022 . Advances， challenges and opportunities in creating data for trustworthy AI . Nature Machine Intelligence ， 4 （ 8 ）： 669 - 677 ［ DOI： 10.1038/s42256-022-00516-1 http://dx.doi.org/10.1038/s42256-022-00516-1 ］

Liang W X and Zou J . 2022 . MetaShift： a dataset of datasets for evaluating contextual distribution shifts and training conflicts ［EB/OL］. ［ 2023-07-10 ］. http://arxiv.org/pdf/2202.06523.pdf http://arxiv.org/pdf/2202.06523.pdf

Lin L T ， Fan H ， Zhang Z P ， Xu Y and Ling H B . 2022 . SwinTrack： a simple and strong baseline for Transformer tracking ［EB/OL］. ［ 2023-07-10 ］. https://arxiv.org/pdf/2112.00995.pdf https://arxiv.org/pdf/2112.00995.pdf

Long G M and Penn D L . 1987 . Dynamic visual acuity： normative functions and practical implications . Bulletin of the Psychonomic Society ， 25 （ 4 ）： 253 - 256 ［ DOI： 10.3758/BF03330347 http://dx.doi.org/10.3758/BF03330347 ］

Lu H C ， Li P X and Wang D . 2018 . Visual object tracking： a survey . Pattern Recognition and Artificial Intelligence ， 31 （ 1 ）： 61 - 76

卢湖川，李佩霞，王栋 . 2018 . 目标跟踪算法综述 . 模式识别与人工智能， 31 （ 1 ）： 61 - 76 ［ DOI： 10.16451/j.cnki.issn1003-6059.201801006 http://dx.doi.org/10.16451/j.cnki.issn1003-6059.201801006 ］

Luiten J ， Voigtlaender P and Leibe B . 2019 . PReMVOS： proposal-generation， refinement and merging for video object segmentation // Proceedings of the 14th Asian Conference on Computer Vision . Perth， Australia ： Springer： 565 - 580 ［ DOI： 10.1007/978-3-030-20870-7_35 http://dx.doi.org/10.1007/978-3-030-20870-7_35 ］

Lukezic A ， Zajc L C ， Vojir T ， Matas J and Kristan M . 2021 . Performance evaluation methodology for long-term single-object tracking . IEEE Transactions on Cybernetics ， 51 （ 12 ）： 6305 - 6318 ［ DOI： 10.1109/TCYB.2020.2980618 http://dx.doi.org/10.1109/TCYB.2020.2980618 ］

Ma C ， Huang J B ， Yang X K and Yang M H . 2015a . Hierarchical convolutional features for visual tracking // Proceedings of 2015 IEEE International Conference on Computer Vision . Santiago， Chile ： IEEE： 3074 - 3082 ［ DOI： 10.1109/ICCV.2015.352 http://dx.doi.org/10.1109/ICCV.2015.352 ］

Ma C ， Yang X K ， Zhang C Y and Yang M H . 2015b . Long-term correlation tracking // Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition . Boston， USA ： IEEE： 5388 - 5396 ［ DOI： 10.1109/CVPR.2015.7299177 http://dx.doi.org/10.1109/CVPR.2015.7299177 ］

Marr D . 2010 . Vision： A Computational Investigation into the Human Representation and Processing of Visual Information . Massachusetts， USA ： The MIT Press

Marvasti-Zadeh S M ， Cheng L ， Ghanei-Yakhdan H and Kasaei S . 2022 . Deep learning for visual tracking： a comprehensive survey . IEEE Transactions on Intelligent Transportation Systems ， 23 （ 5 ）： 3943 - 3968 ［ DOI： 10.1109/TITS.2020.3046478 http://dx.doi.org/10.1109/TITS.2020.3046478 ］

Mayer C ， Danelljan M ， Pani Paudel D and van Gool L . 2021 . Learning target candidate association to keep track of what not to track // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada ： IEEE： 13424 - 13434 ［ DOI： 10.1109/ICCV48922.2021.01319 http://dx.doi.org/10.1109/ICCV48922.2021.01319 ］

Miller G A . 1995 . WordNet： a lexical database for English . Communications of the ACM ， 38 （ 11 ）： 39 - 41 ［ DOI： 10.1145/219717.219748 http://dx.doi.org/10.1145/219717.219748 ］

Miller J W . 1958 . Study of visual acuity during the ocular pursuit of moving test objects. II. Effects of direction of movement， relative movement， and illumination . Journal of the Optical Society of America ， 48 （ 11 ）： 803 - 808 ［ DOI： 10.1364/josa.48.000803 http://dx.doi.org/10.1364/josa.48.000803 ］

Miller J W and Ludvigh E . 1962 . The effect of relative motion on visual acuity . Survey of Ophthalmology ， 7 ： 83 - 116

Mueller M ， Smith N and Ghanem B . 2016 . A benchmark and simulator for UAV tracking // Proceedings of the 14th European Conference on Computer Vision . Amsterdam， the Netherlands ： Springer： 445 - 461 ［ DOI： 10.1007/978-3-319-46448-0_27 http://dx.doi.org/10.1007/978-3-319-46448-0_27 ］

Müller M ， Bibi A ， Giancola S ， Alsubaihi S and Ghanem B . 2018 . TrackingNet： a large-scale dataset and benchmark for object tracking in the wild // Proceedings of the 15th European Conference on Computer Vision . Munich， Germany ： Springer： 310 - 327 ［ DOI： 10.1007/978-3-030-01246-5_19 http://dx.doi.org/10.1007/978-3-030-01246-5_19 ］

Nam H and Han B . 2016 . Learning multi-domain convolutional neural networks for visual tracking // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， USA ： IEEE： 4293 - 4302 ［ DOI： 10.1109/CVPR.2016.465 http://dx.doi.org/10.1109/CVPR.2016.465 ］

Pylyshyn Z W and Storm R W . 1988 . Tracking multiple independent targets： evidence for a parallel tracking mechanism . Spatial Vision ， 3 （ 3 ）： 179 - 197 ［ DOI： 10.1163/156856888x00122 http://dx.doi.org/10.1163/156856888x00122 ］

Quevedo L ， Aznar-Casanova J A and Da Silva J A . 2018 . Dynamic visual acuity . Trends in Psychology ， 26 （ 3 ）： 1283 - 1297 ［ DOI： 10.9788/TP2018.3-06En http://dx.doi.org/10.9788/TP2018.3-06En ］

Quevedo L ， Aznar-Casanova J A ， Merindano-Encina D ， Cardona G and Solé-Fortó J . 2012 . A novel computer software for the evaluation of dynamic visual acuity . Journal of Optometry ， 5 （ 3 ）： 131 - 138 ［ DOI： 10.1016/j.optom.2012.05.003 http://dx.doi.org/10.1016/j.optom.2012.05.003 ］

Real E ， Shlens J ， Mazzocchi S ， Pan X and Vanhoucke V . 2017 . YouTube-BoundingBoxes： a large high-precision human annotated data set for object detection in video // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 7464 - 7473 ［ DOI： 10.1109/CVPR.2017.789 http://dx.doi.org/10.1109/CVPR.2017.789 ］

Ross D A ， Lim J ， Lin R S and Yang M H . 2008 . Incremental learning for robust visual tracking . International Journal of Computer Vision ， 77 （ 1 ）： 125 - 141 ［ DOI： 10.1007/s11263-007-0075-7 http://dx.doi.org/10.1007/s11263-007-0075-7 ］

Russakovsky O ， Deng J ， Su H ， Krause J ， Satheesh S ， Ma S A ， Huang Z H ， Karpathy A ， Khosla A ， Bernstein M ， Berg A C and Li F F . 2015 . Imagenet large scale visual recognition challenge . International Journal of Computer Vision ， 115 （ 3 ）： 211 - 252 ［ DOI： 10.1007/s11263-015-0816-y http://dx.doi.org/10.1007/s11263-015-0816-y ］

Silver D ， Schrittwieser J ， Simonyan K ， Antonoglou I ， Huang A ， Guez A ， Hubert T ， Baker L ， Lai M ， Bolton A ， Chen Y T ， Lillicrap T ， Hui F ， Sifre L ， van den Driessche G ， Graepel T and Hassabis D . 2017 . Mastering the game of Go without human knowledge . Nature ， 550 （ 7676 ）： 354 - 359 ［ DOI： 10.1038/nature24270 http://dx.doi.org/10.1038/nature24270 ］

Simonyan K and Zisserman A . 2015 . Very deep convolutional networks for large-scale image recognition ［EB/OL］. ［ 2023-07-10 ］. https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf

Smeulders A W M ， Chu D M ， Cucchiara R ， Calderara S ， Dehghan A and Shah M . 2014 . Visual tracking： an experimental survey . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 36 （ 7 ）： 1442 - 1468 ［ DOI： 10.1109/TPAMI.2013.230 http://dx.doi.org/10.1109/TPAMI.2013.230 ］

Sudderth E B ， Torralba A ， Freeman W T and Willsky A S . 2005 . Learning hierarchical models of scenes， objects， and parts // Proceedings of the 10th IEEE International Conference on Computer Vision . Beijing， China ： IEEE： 1331 - 1338 ［ DOI： 10.1109/ICCV.2005.137 http://dx.doi.org/10.1109/ICCV.2005.137 ］

Tian Z ， Shen C H ， Chen H and He T . 2019 . FCOS： fully convolutional one-stage object detection // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 9626 - 9635 ［ DOI： 10.1109/ICCV.2019.00972 http://dx.doi.org/10.1109/ICCV.2019.00972 ］

Treisman A M and Gelade G . 1980 . A feature-integration theory of attention . Cognitive Psychology ， 12 （ 1 ）： 97 - 136 ［ DOI： 10.1016/0010-0285（80）90005-5 http://dx.doi.org/10.1016/0010-0285（80）90005-5 ］

Turing A M . 2009 . Computing machinery and intelligence //Epstein R， Roberts G and Beber G， eds. Parsing the Turing Test . Dordrecht ： Springer： 23 - 65 ［ DOI： 10.1007/978-1-4020-6710-5_3 http://dx.doi.org/10.1007/978-1-4020-6710-5_3 ］

Valmadre J ， Bertinetto L ， Henriques J F ， Tao R ， Vedaldi A ， Smeulders A W M ， Torr P H S and Gavves E . 2018 . Long-term tracking in the wild： a benchmark // Proceedings of the 15th European Conference on Computer Vision . Munich， Germany ： Springer： 692 - 707 ［ DOI： 10.1007/978-3-030-01219-9_41 http://dx.doi.org/10.1007/978-3-030-01219-9_41 ］

Vaswani A ， Shazeer N ， Parmar N ， Uszkoreit J ， Jones L ， Gomez A N ， Kaiser Ł and Polosukhin I . 2017 . Attention is all you need // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach， USA ： Curran Associates Inc.： 6000 - 6010

Voigtlaender P ， Luiten J ， Torr P H S and Leibe B . 2020 . Siam R-CNN： visual tracking by re-detection // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 6577 - 6587 ［ DOI： 10.1109/CVPR42600.2020.00661 http://dx.doi.org/10.1109/CVPR42600.2020.00661 ］

Wang D ， Lu H C and Yang M H . 2013 . Online object tracking with sparse prototypes . IEEE Transactions on Image Processing ， 22 （ 1 ）： 314 - 325 ［ DOI： 10.1109/TIP.2012.2202677 http://dx.doi.org/10.1109/TIP.2012.2202677 ］

Wang D ， Lu H C and Yang M H . 2016 . Robust visual tracking via least soft-threshold squares . IEEE Transactions on Circuits and Systems for Video Technology ， 26 （ 9 ）： 1709 - 1721 ［ DOI： 10.1109/TCSVT.2015.2462012 http://dx.doi.org/10.1109/TCSVT.2015.2462012 ］

Wang N ， Zhou W G ， Wang J and Li H Q . 2021 . Transformer meets tracker： exploiting temporal context for robust visual tracking // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： 1571 - 1580 ［ DOI： 10.1109/CVPR46437.2021.00162 http://dx.doi.org/10.1109/CVPR46437.2021.00162 ］

Wang Q ， Gao J ， Xing J L ， Zhang M D and Hu W M . 2017a . DCFNet： discriminant correlation filters network for visual tracking ［EB/OL］. ［ 2023-07-10 ］ https://arxiv.org/pdf/1704.04057.pdf https://arxiv.org/pdf/1704.04057.pdf

Wang X L ， He K M and Gupta A . 2017b . Transitive invariance for self-supervised visual representation learning // Proceedings of 2017 IEEE International Conference on Computer Vision . Venice， Italy ： IEEE： 1338 - 1347 ［ DOI： 10.1109/ICCV.2017.149 http://dx.doi.org/10.1109/ICCV.2017.149 ］

Wu Y ， Lim J and Yang M H . 2013 . Online object tracking： a benchmark // Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition . Portland， USA ： IEEE： 2411 - 2418 ［ DOI： 10.1109/CVPR.2013.312 http://dx.doi.org/10.1109/CVPR.2013.312 ］

Wu Y ， Lim J and Yang M H . 2015 . Object tracking benchmark . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 37 （ 9 ）： 1834 - 1848 ［ DOI： 10.1109/TPAMI.2014.2388226 http://dx.doi.org/10.1109/TPAMI.2014.2388226 ］

Xia C ， Han J W and Zhang D W . 2021 . Evaluation of saccadic scanpath prediction： subjective assessment database and recurrent neural network based metric . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 43 （ 12 ）： 4378 - 4395 ［ DOI： 10.1109/TPAMI.2020.3002168 http://dx.doi.org/10.1109/TPAMI.2020.3002168 ］

Xiang Y ， Alahi A and Savarese S . 2015 . Learning to track： online multi-object tracking by decision making // Proceedings of 2015 IEEE International Conference on Computer Vision . Santiago， Chile ： IEEE： 4705 - 4713 ［ DOI： 10.1109/ICCV.2015.534 http://dx.doi.org/10.1109/ICCV.2015.534 ］

Xu N ， Yang L J ， Fan Y C ， Yang J C ， Yue D C ， Liang Y C ， Price B ， Cohen S and Huang T . 2018 . YouTube-VOS： sequence-to-sequence video object segmentation // Proceedings of the 15th European Conference on Computer Vision . Munich， Germany ： Springer： 603 - 619 ［ DOI： 10.1007/978-3-030-01228-1_36 http://dx.doi.org/10.1007/978-3-030-01228-1_36 ］

Xu Y D ， Wang Z Y ， Li Z X ， Yuan Y and Yu G . 2020 . SiamFC++： towards robust and accurate visual tracking with target estimation guidelines // Proceedings of the 34th AAAI Conference on Artificial Intelligence . New York， USA ： AAAI Press： 12549 - 12556 ［ DOI： 10.1609/aaai.v34i07.6944 http://dx.doi.org/10.1609/aaai.v34i07.6944 ］

Yan B ， Peng H W ， Fu J L ， Wang D and Lu H C . 2021 . Learning spatio-temporal Transformer for visual tracking // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada ： IEEE： 10428 - 10437 ［ DOI： 10.1109/ICCV48922.2021.01028 http://dx.doi.org/10.1109/ICCV48922.2021.01028 ］

Yan B ， Zhao H J ， Wang D ， Lu H C and Yang X Y . 2019 . ‘Skimming-perusal’ tracking： a framework for real-time and robust long-term tracking // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 2385 - 2393 ［ DOI： 10.1109/ICCV.2019.00247 http://dx.doi.org/10.1109/ICCV.2019.00247 ］

Ye B T ， Chang H ， Ma B P ， Shan S G and Chen X L . 2022 . Joint feature learning and relation modeling for tracking： a one-stream framework // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 341 - 357 ［ DOI： 10.1007/978-3-031-20047-2_20 http://dx.doi.org/10.1007/978-3-031-20047-2_20 ］

Yu B ， Tang M ， Zheng L Y ， Zhu G B ， Wang J Q ， Feng H ， Feng X T and Lu H Q . 2021 . High-performance discriminative tracking with Transformers // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada ： IEEE： 9836 - 9845 ［ DOI： 10.1109/ICCV48922.2021.00971 http://dx.doi.org/10.1109/ICCV48922.2021.00971 ］

Yu C S ， Wang E M Y ， Li W C and Braithwaite G . 2014 . Pilots’ visual scan patterns and situation awareness in flight operations . Aviation， Space， and Environmental Medicine ， 85 （ 7 ）： 708 - 714 ［ DOI： 10.3357/asem.3847.2014 http://dx.doi.org/10.3357/asem.3847.2014 ］

Yu H Y ， Li G R ， Zhang W G ， Huang Q M ， Du D W ， Tian Q and Sebe N . 2020 . The unmanned aerial vehicle benchmark： object detection， tracking and baseline . International Journal of Computer Vision ， 128 （ 5 ）： 1141 - 1159 ［ DOI： 10.1007/s11263-019-01266-1 http://dx.doi.org/10.1007/s11263-019-01266-1 ］

Yun S ， Choi J ， Yoo Y ， Yun K and Choi J Y . 2017 . Action-decision networks for visual tracking with deep reinforcement learning // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 1349 - 1358 ［ DOI： 10.1109/CVPR.2017.148 http://dx.doi.org/10.1109/CVPR.2017.148 ］

Zhang R ， Isola P and Efros A A . 2016 . Colorful image colorization // Proceedings of the 14th European Conference on Computer Vision . Amsterdam， the Netherlands ： Springer： 649 - 666 ［ DOI： 10.1007/978-3-319-46487-9_40 http://dx.doi.org/10.1007/978-3-319-46487-9_40 ］

Zhang T Z ， Ghanem B ， Liu S and Ahuja N . 2013 . Robust visual tracking via structured multi-task sparse learning . International Journal of Computer Vision ， 101 （ 2 ）： 367 - 383 ［ DOI： 10.1007/s11263-012-0582-z http://dx.doi.org/10.1007/s11263-012-0582-z ］

Zhang Z P and Peng H W . 2019 . Deeper and wider siamese networks for real-time visual tracking // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 4586 - 4595 ［ DOI： 10.1109/CVPR.2019.00472 http://dx.doi.org/10.1109/CVPR.2019.00472 ］

Zhang Z P ， Peng H W ， Fu J L ， Li B and Hu W M . 2020 . Ocean： object-aware anchor-free tracking // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 771 - 787 ［ DOI： 10.1007/978-3-030-58589-1_46 http://dx.doi.org/10.1007/978-3-030-58589-1_46 ］

Zhu Z ， Wang Q ， Li B ， Wu W ， Yan J J and Hu W M . 2018 . Distractor-aware siamese networks for visual object tracking // Proceedings of the 15th European Conference on Computer Vision . Munich， Germany ： Springer： 103 - 119 ［ DOI： 10.1007/978-3-030-01240-3_7 http://dx.doi.org/10.1007/978-3-030-01240-3_7 ］

文章被引用时，请邮件提醒。

提交

暂无数据