面向目标检测的视点规划方法综述

王健宇; 朱枫; 郝颖明; 王群; 赵鹏飞; 孙海波

doi:10.11834/jig.240319

综述 | 浏览量 : 0 下载量: 132 CSCD: 0

PDF
导出
分享
收藏
专辑

面向目标检测的视点规划方法综述
View planning methods of object detection： a survey
2025年30卷第3期页码：641-659
收稿日期：2024-06-18，

修回日期：2024-08-28，

纸质出版日期：2025-03-16
DOI： 10.11834/jig.240319
稿件说明：

移动端阅览

王健宇，朱枫，郝颖明，王群，赵鹏飞，孙海波. 2025. 面向目标检测的视点规划方法综述. 中国图象图形学报， 30(03):0641-0659 DOI： 10.11834/jig.240319.

Wang Jianyu， Zhu Feng， Hao Yingming， Wang Qun， Zhao Pengfei， Sun Haibo. 2025. View planning methods of object detection： a survey. Journal of Image and Graphics， 30(03):0641-0659 DOI： 10.11834/jig.240319.

摘要

目标检测是计算机视觉领域的基础研究方向之一。由于图像采集时物体摆放密集、光照条件差等因素导致图像失去细节，当使用此类图像作为输入时，常规的目标检测算法对目标物的检测结果无法满足任务需求。为了解决这类问题，面向目标检测的视点规划这一智能感知方法应运而生，其可自主分析当前条件下影响检测任务的因素，调整相机的位姿参数规避影响，实现目标物准确检测。面向目标检测的视点规划方法不仅可以辅助计算机视觉的其他领域，也会为未来的智能化生活提供便利。为了反映其研究现状和最新进展，本文梳理了2007年以来的文献，对国内外的研究方法做出概括性总结。首先，以算法应用的场景维度和调整参数作为分类依据，将面向目标检测的视点规划方法分为二维像素调整的规划方法、三维空间移动的规划方法以及两者结合的规划方法3类，本文重点对前两类方法进行分析与总结。其次，解析每类方法的基本思想，并指出各类方法需解决的关键问题，然后对解决问题的主要研究方法进行归纳和分析，并总结各自的优点和局限性。除此之外，本文也对各类场景下可使用的数据集和评价指标进行简要介绍。最后，在目前方法的分析基础上，探讨面向目标检测的视点规划领域所面临的挑战，并对未来研究方法进行展望。

Abstract

Object detection is one of the fundamental research directions in the field of computer vision and is also the cornerstone of advanced vision research. When objects are densely arranged or located under poor lighting conditions， crucial details can be lost during image acquisition. When using images with missing details as input， the detection results from conventional target detection algorithms often fail to meet task requirements. To address these challenges， intelligent perceptual methods for point-of-view planning in target detection have emerged. These methods can autonomously analyze the factors affecting detection tasks under current conditions， adjust the camera’s pose parameters to mitigate these effects， and achieve accurate target detection. This paper reviews and analyzes relevant studies since 2007 and summarizes domestic and foreign research methods to reflect the research status and the latest development of viewpoint planning methods for object detection. For simplification， this method is called active object detection （AOD） in this article. According to the different use scenarios， this paper divides the active object detection methods into two categories： AOD in two-dimensional scenes， AOD in three-dimensional scenes， and AOD combining the two. The third method is uncommon； thus， this paper mainly introduces the first two methods. Specifically， in two-dimensional scenes， AOD methods are divided into pixel-based methods and those that simulate camera parameters， depending on whether a single-pixel or an overall image is being planned. The most important part of the pixel-based approach is the selection of the target pixel point and the strategy for planning the next pixel. Typically， integral features， scale features， or key points， which are the parts of the target that have the largest gap between the target and the background， are used by researchers to locate the possible location of target pixels. After positioning the target pixel， the moving position of the next pixel will be set in accordance with the category of the region to ensure the continuity of the front and back frames and avoid the task failure caused by planning errors. For AOD methods that simulate camera parameters， different influencing factors cause various difficulties in target detection. Therefore， researchers have designed different planning scenarios by analyzing the types of influencing factors， and some excellent results have emerged in recent years. As time goes by， the popularity of moveable robots has introduced AOD into a new development environment： 3D scenes. In three-dimensional environments， the AOD method enables the intelligent agent to actively select the next viewpoint pose in space， thereby mitigating the influence of interference factors on the target detection process. We classify 3D scenes based on the degree of known spatial location information into two categories： 3D scenes with known spatial relationships and 3D scenes with unknown spatial relationships. In the first type of scenario， the placement of the target object and surrounding objects， the display of spatial category markers， and the range of viewpoint planning are all known， and the AOD method can perform viewpoint planning based on the known information. In this type of approach， researchers focus more on the representation of relationships and the selection of the next viewpoint in a fixed search space. The second type of space has no information to assist， and the agent can only rely on the observation results to select the next viewpoint. In real life， situations where relationships are unknown are highly common； therefore， the design of AOD methods in this situation is currently a popular research direction. Researchers have made considerable efforts to provide detailed descriptions due to the close relationship between the planning strategy of such scenarios and the observed results. In AOD， observation information is usually referred to as state expression， and a detailed expression leads to improved strategy generation. In addition， researchers have made numerous efforts in the evaluation function of the next view to evaluate the next viewpoint and modify the planning strategy. AOD has two main objectives in unknown environments： path optimization and detection effect optimization. The evaluation function is generally divided into single-element and multi-element evaluations based on the types of evaluation factors. Despite the accuracy of multi-element evaluation， the selection of elements in different problems must be highly consistent. Identifying the same components across various scenarios to design a universal evaluation function remains a potential breakthrough area for researchers in the future. In addition to analyzing the methods mentioned above， this paper also provides a brief introduction to the datasets that AOD methods can use in different types of scenarios. The viewpoint planning in two-dimensional scenes is consistent with the scenes used by conventional object detection methods. Therefore， the datasets， such as large-scale public datasets COCO and Pascal VOC， have numerous overlaps. Meanwhile， the evaluation indicators of the two methods are also the same； therefore， performance comparison can be directly conducted. Considering motion factors， directly comparing detection results on 3D datasets such as AVD and T-LESS to determine the accuracy of the movement path is impossible. Therefore， researchers have designed task success rate （SR） and average travel distance as the leading indicators to measure the effectiveness of the AOD algorithm. Notably， although many excellent results have been achieved in viewpoint planning methods oriented toward target detection， some parts can still be improved in terms of scene design and research methodology. First， some real physical elements can be added to the scene design to transform the planning problem into an optimization problem under certain constraints. Second， the methods suitable for two- and three-dimensional scenes are closely combined， further realizing accurate detection by changing the sensor parameters in inaccessible areas. Finally， detection-oriented viewpoint planning methods typically output discrete actions and are also tightly bound to the task. Therefore， viewpoint planning in continuous environments or establishing a generic framework for task-independent viewpoint planning can also be considered future directions.

关键词

Keywords

references

Aloimonos J ， Weiss I and Bandyopadhyay A . 1988 . Active vision . International Journal of Computer Vision ， 1 （ 4 ）： 333 - 356 ［ DOI： 10.1007/BF00133571 http://dx.doi.org/10.1007/BF00133571 ］

Amiri S ， Chandan K and Zhang S Q . 2022 . Reasoning with scene graphs for robot planning under partial observability . IEEE Robotics and Automation Letters ， 7 （ 2 ）： 5560 - 5567 ［ DOI： 10.1109/LRA.2022.3157567 http://dx.doi.org/10.1109/LRA.2022.3157567 ］

Ammirato P ， Berg A C and Košeck􀅡 J . 2018 . Active vision dataset benchmark // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW） . Salt Lake City， USA ： IEEE： 2127 - 2130 ［ DOI： 10.1109/CVPRW.2018.00277 http://dx.doi.org/10.1109/CVPRW.2018.00277 ］

Ammirato P ， Poirson P ， Park E ， Košeck􀅡 J and Berg A C . 2017 . A dataset for developing and benchmarking active vision // Proceedings of 2017 IEEE International Conference on Robotics and Automation （ICRA） . Singapore，Singapore ： IEEE： 1378 - 1385 ［ DOI： 10.1109/ICRA.2017.7989164 http://dx.doi.org/10.1109/ICRA.2017.7989164 ］

Anderson P ， Wu Q ， Teney D ， Bruce J ， Johnson M ， Sünderhauf N ， Reid I ， Gould S and Van Den Hengel A . 2018 . Vision-and-language navigation： interpreting visually-grounded navigation instructions in real environments // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 3674 - 3683 ［ DOI： 10.1109/CVPR.2018.00387 http://dx.doi.org/10.1109/CVPR.2018.00387 ］

Atanasov N ， Sankaran B ， Le Ny J ， Pappas G J and Daniilidis K . 2013 . Nonmyopic view planning for active object detection ［EB/OL］. ［ 2024-06-03 ］. https://arxiv.org/pdf/1309.5401.pdf https://arxiv.org/pdf/1309.5401.pdf

Bajcsy R ， Aloimonos Y and Tsotsos J K . 2018 . Revisiting active perception . Autonomous Robots ， 42 （ 2 ）： 177 - 196 ［ DOI： 10.1007/s10514-017-9615-3 http://dx.doi.org/10.1007/s10514-017-9615-3 ］

Bian K ， Qiao T Z ， Yang Y and Zhang H T . 2021 . Robot visual face search and tracking method based on reinforcement learning . Electronic Measurement Technology ， 44 （ 8 ）： 82 - 86

卞凯，乔铁柱，杨毅，张海涛 . 2021 . 基于强化学习的机器人人脸搜索和跟踪方法 . 电子测量技术， 44 （ 8 ）： 82 - 86 ［ DOI： 10.19651/j.cnki.emt.2106361 http://dx.doi.org/10.19651/j.cnki.emt.2106361 ］

Chang A ， Dai A ， Funkhouser T ， Halber M ， Nießner M ， Savva M ， Song S R ， Zeng A and Zhang Y D . 2017 . Matterport3 D： learning from RGB-D data in indoor environments //Proceedings of 2017 International Conference on 3D Vision （3DV）. Qingdao， China ： IEEE： 667 - 676 ［ DOI： 10.1109/3DV.2017.00081 http://dx.doi.org/10.1109/3DV.2017.00081 ］

Chen S Y ， Li Y F and Kwok N M . 2011 . Active vision in robotic systems： a survey of recent developments . The International Journal of Robotics Research ， 30 （ 11 ）： 1343 - 1377 ［ DOI： 10.1177/0278364911410755 http://dx.doi.org/10.1177/0278364911410755 ］

Connolly C . 1985 . The determination of next best views // Proceedings of 1985 IEEE International Conference on Robotics and Automation . St. Louis， USA ： IEEE： 432 - 435 ［ DOI： 10.1109/ROBOT.1985.1087372 http://dx.doi.org/10.1109/ROBOT.1985.1087372 ］

de Croon G ， Postma E O and van den Herik H J . 2005 . Sensory-motor coordination in gaze control // Applications of Evolutionary Computing . Berlin， Germany ： Springer： 334 - 344 ［ DOI： 10.1007/978-3-540-32003-6_34 http://dx.doi.org/10.1007/978-3-540-32003-6_34 ］

de Croon G . 2007 . Active object detection // Proceedings of the 2nd International Conference on Computer Vision Theory and Applications . Barcelona， Spain ： SciTePress： 97 - 103 ［ DOI： 10.5220/0002044600970103 http://dx.doi.org/10.5220/0002044600970103 ］

Duan K W ， Bai S ， Xie L X ， Qi H G ， Huang Q M and Tian Q . 2019 . CenterNet： keypoint triplets for object detection // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 6568 - 6577 ［ DOI： 10.1109/ICCV.2019.00667 http://dx.doi.org/10.1109/ICCV.2019.00667 ］

Ecins A ， Fermüller C and Aloimonos Y . 2016 . Cluttered scene segmentation using the symmetry constraint // Proceedings of 2016 IEEE International Conference on Robotics and Automation （ICRA） . Stockholm， Sweden ： IEEE： 2271 - 2278 ［ DOI： 10.1109/ICRA.2016.7487376 http://dx.doi.org/10.1109/ICRA.2016.7487376 ］

Eidenberger R ， Grundmann T ， Feiten W and Zoellner R . 2008 . Fast parametric viewpoint estimation for active object detection // Proceedings of 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems . Seoul， Korea （South）： IEEE： 309 - 314 ［ DOI： 10.1109/MFI.2008.4648083 http://dx.doi.org/10.1109/MFI.2008.4648083 ］

Eidenberger R ， Zoellner R and Scharinger J . 2009 . Probabilistic occlusion estimation in cluttered environments for active perception planning // Proceedings of 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics . Singapore， Singapore ： IEEE： 1248 - 1253 ［ DOI： 10.1109/AIM.2009.5229779 http://dx.doi.org/10.1109/AIM.2009.5229779 ］

Eidinger E ， Enbar R and Hassner T . 2014 . Age and gender estimation of unfiltered faces . IEEE Transactions on Information Forensics and Security ， 9 （ 12 ）： 2170 - 2179 ［ DOI： 10.1109/TIFS.2014.2359646 http://dx.doi.org/10.1109/TIFS.2014.2359646 ］

Everingham M ， Van Gool L ， Williams C K I ， Winn J and Zisserman A . 2010 . The PASCAL Visual Object Classes （VOC） challenge . International Journal of Computer Vision ， 88 （ 2 ）： 303 - 338 ［ DOI： 10.1007/s11263-009-0275-4 http://dx.doi.org/10.1007/s11263-009-0275-4 ］

Fang F ， Liang W Y ， Wu Y ， Xu Q L and Lim J-H . 2022 . Self-supervised reinforcement learning for active object detection . IEEE Robotics and Automation Letters ， 7 （ 4 ）： 10224 - 10231 ［ DOI： 10.1109/LRA.2022.3193019 http://dx.doi.org/10.1109/LRA.2022.3193019 ］

Fang F ， Xu Q L ， Gauthier N ， Li L Y and Lim J-H . 2021 . Enhancing multi-step action prediction for active object detection // Proceedings of 2021 IEEE International Conference on Image Processing （ICIP） . Anchorage， USA ： IEEE： 2189 - 2193 ［ DOI： 10.1109/ICIP42928.2021.9506078 http://dx.doi.org/10.1109/ICIP42928.2021.9506078 ］

Fu H ， Cai B W ， Gao L ， Zhang L X ， Wang J M ， Li C ， Zeng Q X ， Sun C Y ， Jia R F ， Zhao B Q and Zhang H . 2021 . 3D-FRONT： 3D furnished rooms with layOuts and semaNTics // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . Montreal， Canada ： IEEE： 10913 - 10922 ［ DOI： 10.1109/ICCV48922.2021.01075 http://dx.doi.org/10.1109/ICCV48922.2021.01075 ］

Gao C ， Chen J Y ， Liu S ， Wang L T ， Zhang Q and Wu Q . 2021 . Room-and-object aware knowledge reasoning for remote embodied referring expression // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 3063 - 3072 ［ DOI： 10.1109/CVPR46437.2021.00308 http://dx.doi.org/10.1109/CVPR46437.2021.00308 ］

Gao M F ， Yu R C ， Li A ， Morariu V I and Davis L S . 2018 . Dynamic zoom-in network for fast object detection in large images // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 6926 - 6935 ［ DOI： 10.1109/CVPR.2018.00724 http://dx.doi.org/10.1109/CVPR.2018.00724 ］

Han X N ， Liu H P ， Sun F C and Yang D F . 2018 . Active object detection using double DQN and prioritized experience replay // Proceedings of 2018 International Joint Conference on Neural Networks （IJCNN） . Rio de Janeiro， Brazil ： IEEE： 1 - 7 ［ DOI： 10.1109/IJCNN.2018.8489296 http://dx.doi.org/10.1109/IJCNN.2018.8489296 ］

Han X N ， Liu H P ， Sun F C and Zhang X Y . 2019 . Active object detection with multistep action prediction using deep Q-network . IEEE Transactions on Industrial Informatics ， 15 （ 6 ）： 3723 - 3731 ［ DOI： 10.1109/TII.2019.2890849 http://dx.doi.org/10.1109/TII.2019.2890849 ］

Hangaragi S ， Singh T and Neelima N . 2023 . Face detection and recognition using face mesh and deep neural network . Procedia Computer Science ， 218 ： 741 - 749 ［ DOI： 10.1016/j.procs.2023.01.054 http://dx.doi.org/10.1016/j.procs.2023.01.054 ］

Hodan T ， Haluza P ， Obdrž􀅡lek Š ， Matas J ， Lourakis M and Zabulis X . 2017 . T-LESS： an RGB-D dataset for 6D pose estimation of texture-less objects // Proceedings of 2017 IEEE Winter Conference on Applications of Computer Vision （WACV） . Santa Rosa， USA ： IEEE： 880 - 888 ［ DOI： 10.1109/WACV.2017.103 http://dx.doi.org/10.1109/WACV.2017.103 ］

Itti L ， Koch C and Niebur E . 1998 . A model of saliency-based visual attention for rapid scene analysis . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 20 （ 11 ）： 1254 - 1259 ［ DOI： 10.1109/34.730558 http://dx.doi.org/10.1109/34.730558 ］

Jain S . 2023 . DeepSeaNet： improving underwater object detection using EfficientDet ［EB/OL］. ［ 2024-06-03 ］. https://arxiv.org/pdf/2306.06075.pdf https://arxiv.org/pdf/2306.06075.pdf

Jang J S R . 1993 . ANFIS： adaptive-network-based fuzzy inference system . IEEE Transactions on Systems， Man， and Cybernetics ， 23 （ 3 ）： 665 - 685 ［ DOI： 10.1109/21.256541 http://dx.doi.org/10.1109/21.256541 ］

Jiang L Y ， Cai T ， Ma Q X ， Xu F J and Wang S J . 2020 . Active object detection in sonar images . IEEE Access ， 8 ： 102540 - 102553 ［ DOI： 10.1109/ACCESS.2020.2999341 http://dx.doi.org/10.1109/ACCESS.2020.2999341 ］

Koyun O C ， Keser R K ， Akkaya İ B and Töreyin B U . 2022 . Focus-and-Detect： a small object detection framework for aerial images . Signal Processing ： Image Communication ， 104 ： # 116675 ［ DOI： 10.1016/j.image.2022.116675 http://dx.doi.org/10.1016/j.image.2022.116675 ］

Law H and Deng J . 2020 . CornerNet： detecting objects as paired keypoints . International Journal of Computer Vision ， 128 （ 3 ）： 642 - 656 ［ DOI： 10.1007/s11263-019-01204-1 http://dx.doi.org/10.1007/s11263-019-01204-1 ］

Li C L ， Cheng H ， Hu S Y ， Liu X B ， Tang J and Lin L . 2016 . Learning collaborative sparse representation for grayscale-thermal tracking . IEEE Transactions on Image Processing ， 25 （ 12 ）： 5743 - 5756 ［ DOI： 10.1109/TIP.2016.2614135 http://dx.doi.org/10.1109/TIP.2016.2614135 ］

Li C L ， Liang X Y ， Lu Y J ， Zhao N and Tang J . 2019 . RGB-T object tracking： benchmark and baseline . Pattern Recognition ， 96 ： # 106977 ［ DOI： 10.1016/j.patcog.2019.106977 http://dx.doi.org/10.1016/j.patcog.2019.106977 ］

Li C L ， Liu L ， Lu A D ， Ji Q and Tang J . 2020 . Challenge-aware RGBT tracking // Computer Vision—ECCV 2020 . Glasgow， UK ： Springer： 222 – 237 ［ DOI： 10.1007/978-3-030-58542-6_14 http://dx.doi.org/10.1007/978-3-030-58542-6_14 ］

Li J L ， Xu R S ， Ma J ， Zou Q ， Ma J Q and Yu H K . 2023 . Domain adaptive object detection for autonomous driving under foggy weather // Proceedings of 2023 IEEE/CVF Winter Conference on Applications of Computer Vision （WACV） . Waikoloa， USA ： IEEE： 612 - 622 ［ DOI： 10.1109/WACV56688.2023.00068 http://dx.doi.org/10.1109/WACV56688.2023.00068 ］

Li W B ， Saeedi S ， McCormac J ， Clark R ， Tzoumanikas D ， Ye Q ， Huang Y Z ， Tang R and Leutenegger S . 2018 . InteriorNet： mega-scale multi-sensor photo-realistic indoor scenes dataset ［EB/OL］. ［ 2024-06-03 ］. https://arxiv.org/pdf/1809.00716.pdf https://arxiv.org/pdf/1809.00716.pdf

Liang W Y ， Fang F ， Acar C ， Toh W Q ， Sun Y ， Xu Q L and Wu Y . 2022 . Visuo-tactile manipulation planning using reinforcement learning with affordance representation ［EB/OL］. ［ 2024-06-03 ］. https://arxiv.org/pdf/2207.06608.pdf https://arxiv.org/pdf/2207.06608.pdf

Liao H C ， Chen P Y ， Lin Z Y and Lim Z Y . 2017 . Automatic zooming mechanism for capturing clear moving object image using high definition fixed camera // Proceedings of the 19th International Conference on Advanced Communication Technology （ICACT） . PyeongChang， Korea （South）： IEEE： 869 - 876 ［ DOI： 10.23919/ICACT.2017.7890238 http://dx.doi.org/10.23919/ICACT.2017.7890238 ］

Lin T Y ， Goyal P ， Girshick R ， He K M and Doll􀅡r P . 2020 . Focal loss for dense object detection . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 42 （ 2 ）： 318 - 327 ［ DOI： 10.1109/TPAMI.2018.2858826 http://dx.doi.org/10.1109/TPAMI.2018.2858826 ］

Lin T Y ， Maire M ， Belongie S ， Hays J ， Perona P ， Ramanan D ， Doll􀅡r P and Zitnick C L . 2014 . Microsoft COCO： common objects in context // Proceedings of the 13th European Conference on Computer Vision—ECCV 2014 . Zürich， Switzerland ： Springer： 740 - 755 ［ DOI： 10.1007/978-3-319-10602-1_48 http://dx.doi.org/10.1007/978-3-319-10602-1_48 ］

Liu H P ， Wu Y P and Sun F C . 2018 . Extreme trust region policy optimization for active object recognition . IEEE Transactions on Neural Networks and Learning Systems ， 29 （ 6 ）： 2253 - 2258 ［ DOI： 10.1109/TNNLS.2017.2785233 http://dx.doi.org/10.1109/TNNLS.2017.2785233 ］

Liu L ， Li C L ， Xiao Y ， Ruan R and Fan M H . 2024 . RGBT tracking via challenge-based appearance disentanglement and interaction . IEEE Transactions on Image Processing ， 33 ： 1753 - 1767 ［ DOI： 10.1109/TIP.2024.3371355 http://dx.doi.org/10.1109/TIP.2024.3371355 ］

Liu S P ， Tian G H ， Zhang Y ， Zhang M Y and Liu S . 2022a . Active object detection based on a novel deep Q-learning network and long-term learning strategy for the service robot . IEEE Transactions on Industrial Electronics ， 69 （ 6 ）： 5984 - 5993 ［ DOI： 10.1109/TIE.2021.3090707 http://dx.doi.org/10.1109/TIE.2021.3090707 ］

Liu S P ， Tian G H ， Zhang Y ， Zhang M Y and Liu S . 2022b . Service planning oriented efficient object search： a knowledge-based framework for home service robot . Expert Systems with Applications ， 187 ： # 115853 ［ DOI： 10.1016/j.eswa.2021.115853 http://dx.doi.org/10.1016/j.eswa.2021.115853 ］

Liu W ， Anguelov D ， Erhan D ， Szegedy C ， Reed S ， Fu C Y and Berg A C . 2016 . SSD： single shot MultiBox detector // Proceedings of the 14th European Conference on Computer Vision—ECCV 2016 . Amsterdam， the Netherlands ： Springer： 21 - 37 ［ DOI： 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2 ］

Liu X Y ， Iwase S and Kitani K M . 2021 . StereOBJ-1M： large-scale stereo image dataset for 6D object pose estimation // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . Montreal， Canada ： IEEE： 10850 - 10859 ［ DOI： 10.1109/ICCV48922.2021.01069 http://dx.doi.org/10.1109/ICCV48922.2021.01069 ］

Liu Z M ， Wang J R ， Li J ， Liu P D and Ren K . 2023 . A novel multiple targets detection method for service robots in the indoor complex scenes . Intelligent Service Robotics ， 16 （ 4 ）： 453 - 469 ［ DOI： 10.1007/s11370-023-00471-9 http://dx.doi.org/10.1007/s11370-023-00471-9 ］

Lorbach M ， Höfer S and Brock O . 2014 . Prior-assisted propagation of spatial information for object search // Proceedings of 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems . Chicago， USA ： IEEE： 2904 - 2909 ［ DOI： 10.1109/IROS.2014.6942962 http://dx.doi.org/10.1109/IROS.2014.6942962 ］

Mishra A ， Aloimonos Y and Fah C L . 2009 . Active segmentation with fixation // Proceedings of the 12th IEEE International Conference on Computer Vision . Kyoto， Japan ： IEEE： 468 - 475 ［ DOI： 10.1109/ICCV.2009.5459254 http://dx.doi.org/10.1109/ICCV.2009.5459254 ］

Mittal S ， Karthik M S ， Kumar S and Krishna K M . 2014 . Small object discovery and recognition using actively guided robot // Proceedings of the 22nd International Conference on Pattern Recognition . Stockholm， Sweden ： IEEE： 4334 - 4339 ［ DOI： 10.1109/ICPR.2014.742 http://dx.doi.org/10.1109/ICPR.2014.742 ］

Morrison D ， Corke P and Leitner J . 2019 . Multi-view picking： next-best-view reaching for improved grasping in clutter // Proceedings of 2019 International Conference on Robotics and Automation （ICRA） . Montreal， Canada ： IEEE： 8762 - 8768 ［ DOI： 10.1109/ICRA.2019.8793805 http://dx.doi.org/10.1109/ICRA.2019.8793805 ］

Murphy K ， Torralba A ， Eaton D and Freeman W . 2006 . Object detection and localization using local and global features // Toward Category-Level Object Recognition . Berlin， Germany ： Springer： 382 - 400 ［ DOI： 10.1007/11957959_20 http://dx.doi.org/10.1007/11957959_20 ］

Pan X Q ， Charron N ， Yang Y Q ， Peters S ， Whelan T ， Kong C ， Parkhi O ， Newcombe R and Ren Y H . 2023 . Aria digital twin： a new benchmark dataset for egocentric 3D machine perception // Proceedings of 2023 IEEE/CVF International Conference on Computer Vision （ICCV） . Paris， France ： IEEE： 20076 - 20086 ［ DOI： 10.1109/ICCV51070.2023.01842 http://dx.doi.org/10.1109/ICCV51070.2023.01842 ］

Pirinen A and Sminchisescu C . 2018 . Deep reinforcement learning of region proposal networks for object detection // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 6945 - 6954 ［ DOI： 10.1109/CVPR.2018.00726 http://dx.doi.org/10.1109/CVPR.2018.00726 ］

Qi Y K ， Wu Q ， Anderson P ， Wang X ， Wang Y W ， Shen C H and van den Hengel A . 2020 . REVERIE： remote embodied visual referring expression in real indoor environments // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 9979 - 9988 ［ DOI： 10.1109/CVPR42600.2020.01000 http://dx.doi.org/10.1109/CVPR42600.2020.01000 ］

Qi Y K ， Zhang S P ， Zhang W G ， Li S ， Huang Q M and Yang M H . 2019 . Learning attribute-specific representations for visual tracking // Proceedings of the 33rd AAAI conference on artificial intelligence . Honolulu， USA ： AAAI： 8835 - 8842 ［ DOI： 10.1609/aaai.v33i01.33018835 http://dx.doi.org/10.1609/aaai.v33i01.33018835 ］

Qian X Q ， Liu W F ， Zhang J and Cao Y . 2022 . Underwater-relevant image object detection based feature-degraded enhancement method . Journal of Image and Graphics ， 27 （ 11 ）： 3185 - 3198

钱晓琪，刘伟峰，张敬，曹洋 . 2022 . 面向水下图像目标检测的退化特征增强算法 . 中国图象图形学报， 27 （ 11 ）： 3185 - 3198 ［ DOI： 10.11834/jig.210415 http://dx.doi.org/10.11834/jig.210415 ］

Rasouli A and Tsotsos J K . 2014 . Visual saliency improves autonomous visual search // Proceedings of 2014 Canadian Conference on Computer and Robot Vision . Montreal， Canada ： IEEE： 111 - 118 ［ DOI： 10.1109/CRV.2014.23 http://dx.doi.org/10.1109/CRV.2014.23 ］

Redmon J and Farhadi A . 2018 . YOLOv3： an incremental improvement ［EB/OL］. ［ 2024-06-03 ］. https://arxiv.org/pdf/1804.02767.pdf https://arxiv.org/pdf/1804.02767.pdf

Ren S Q ， He K M ， Girshick R and Sun J . 2017 . Faster R-CNN： towards real-time object detection with region proposal networks . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 39 （ 6 ）： 1137 - 1149 ［ DOI： 10.1109/TPAMI.2016.2577031 http://dx.doi.org/10.1109/TPAMI.2016.2577031 ］

Ren Z H ， Fang F Z ， Yan N and Wu Y . 2022 . State of the art in defect detection based on machine vision . International Journal of Precision Engineering and Manufacturing-Green Technology ， 9 （ 2 ）： 661 - 691 ［ DOI： 10.1007/s40684-021-00343-6 http://dx.doi.org/10.1007/s40684-021-00343-6 ］

Roy A M ， Bose R and Bhaduri J . 2022 . A fast accurate fine-grain object detection model based on YOLOv4 deep neural network . Neural Computing and Application ， 34 （ 5 ）： 3895 - 3921 ［ DOI： 10.1007/s00521-021-06651-x http://dx.doi.org/10.1007/s00521-021-06651-x ］

Sakaridis C ， Dai D X and van Gool L . 2018 . Semantic foggy scene understanding with synthetic data . International Journal of Computer Vision ， 126 （ 9 ）： 973 - 992 ［ DOI： 10.1007/s11263-018-1072-8 http://dx.doi.org/10.1007/s11263-018-1072-8 ］

Santana P ， Alves N ， Correia L and Barata J . 2010 . A saliency-based approach to boost trail detection // Proceedings of 2010 IEEE International Conference on Robotics and Automation . Anchorage， USA ： IEEE： 1426 - 1431 ［ DOI： 10.1109/ROBOT.2010.5509929 http://dx.doi.org/10.1109/ROBOT.2010.5509929 ］

Savva M ， Chang A X ， Dosovitskiy A ， Funkhouser T and Koltun V . 2017 . MINOS： multimodal indoor simulator for navigation in complex environments ［EB/OL］. ［ 2024-06-03 ］. https://arxiv.org/pdf/1712.03931.pdf https://arxiv.org/pdf/1712.03931.pdf

Schmid F J ， Lauri M and Frintrop S . 2019 . Explore， approach， and terminate： evaluating subtasks in active visual object search based on deep reinforcement learning // Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS） . Macau， China ： IEEE： 5008 - 5013 ［ DOI： 10.1109/IROS40897.2019.8967805 http://dx.doi.org/10.1109/IROS40897.2019.8967805 ］

Scott W R ， Roth G and Rivest J F . 2003 . View planning for automated three-dimensional object reconstruction and inspection . ACM Computing Surveys ， 35 （ 1 ）： 64 - 96 ［ DOI： 110.1145/641865.641868 http://dx.doi.org/110.1145/641865.641868 ］

Shen L D ， Huo C L ， Xu N ， Han C W and Wang Z C . 2024 . Learn how to see： collaborative embodied learning for object detection and camera adjusting // Proceedings of the 38th AAAI Conference on Artificial Intelligence . Vancouver， Canada ： AAAI： 4793 - 4801 ［ DOI： 10.1609/aaai.v38i5.28281 http://dx.doi.org/10.1609/aaai.v38i5.28281 ］

Shim V A ， Yuan M L and Tan B H . 2017 . Automatic object searching by a mobile robot with single RGB-D camera // Proceedings of 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference （APSIPA ASC） . Kuala Lumpur， Malaysia ： IEEE： 56 - 62 ［ DOI： 10.1109/APSIPA.2017.8282002 http://dx.doi.org/10.1109/APSIPA.2017.8282002 ］

Srivastava S ， Narayan S and Mittal S . 2021 . A survey of deep learning techniques for vehicle detection from UAV images . Journal of Systems Architecture ， 117 ： # 102152 ［ DOI： 10.1016/j.sysarc.2021.102152 http://dx.doi.org/10.1016/j.sysarc.2021.102152 ］

Tan H ， Yu L C and Bansal M . 2019 . Learning to navigate unseen environments： back translation with environmental dropout ［EB/OL］. ［ 2024-06-03 ］. https://arxiv.org/pdf/1904.04195.pdf https://arxiv.org/pdf/1904.04195.pdf

Uzkent B and Ermon S . 2020 . Learning when and where to zoom with deep reinforcement learning // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 12342 - 12351 ［ DOI： 10.1109/CVPR42600.2020.01236 http://dx.doi.org/10.1109/CVPR42600.2020.01236 ］

Van Hasselt H ， Guez A and Silver D . 2016 . Deep reinforcement learning with double Q-learning // Proceedings of the 30th AAAI Conference on Artificial Intelligence . Phoenix， USA ： AAAI： 2094 - 2100 ［ DOI： 10.1609/aaai.v30i1.10295 http://dx.doi.org/10.1609/aaai.v30i1.10295 ］

Viola P and Jones M J . 2004 . Robust real-time face detection . International Journal of Computer Vision ， 57 （ 2 ）： 137 - 154 ［ DOI： 10.1023/B：VISI.0000013087.49260.fb http://dx.doi.org/10.1023/B：VISI.0000013087.49260.fb ］

Wang Z ， Yin Y X ， Shi J P ， Fang W ， Li H S and Wang X G . 2017 . Zoom-in-Net： deep mining lesions for diabetic retinopathy detection // Proceedings of the 20th International Conference on Medical Image Computing and Computer Assisted Intervention - MICCAI 2017 . Quebec City， Canada ： Springer： 267 - 275 ［ DOI： 10.1007/978-3-319-66179-7_31 http://dx.doi.org/10.1007/978-3-319-66179-7_31 ］

Wu Y ， Lim J and Yang M H . 2015 . Object tracking benchmark . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 37 （ 9 ）： 1834 - 1848 ［ DOI： 10.1109/TPAMI.2014.2388226 http://dx.doi.org/10.1109/TPAMI.2014.2388226 ］

Wu Y ， Wu Y X ， Gkioxari G and Tian Y D . 2018 . Building generalizable agents with a realistic and rich 3D environment ［EB/OL］. ［ 2024-06-03 ］. https://arxiv.org/pdf/1801.02209.pdf https://arxiv.org/pdf/1801.02209.pdf

Xiang T G ， Zhang Y X ， Lu Y Y ， Yuille A L ， Zhang C Y ， Cai W D and Zhou Z W . 2023 . SQUID： deep feature in-painting for unsupervised anomaly detection // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， Canada ： IEEE： 23890 - 23901 ［ DOI： 10.1109/cvpr52729.2023.02288 http://dx.doi.org/10.1109/cvpr52729.2023.02288 ］

Xiao Y ， Yang M M ， Li C L ， Liu L and Tang J . 2022 . Attribute-based progressive fusion network for RGBT tracking // Proceedings of 2022 AAAI Conference on Artificial Intelligence . Vancouver， Canada ： AAAI： 36（3 ）， 2831 - 2838 ［ DOI： /10.1609/aaai.v36i3.20187 http://dx.doi.org//10.1609/aaai.v36i3.20187 ］

Xu J T ， Li Y L and Wang S J . 2021c . AdaZoom： adaptive zoom network for multi-scale object detection in large scenes ［EB/OL］. ［ 2024-06-03 ］. https://arxiv.org/pdf/2106.10409.pdf https://arxiv.org/pdf/2106.10409.pdf

Xu N ， Huo C L and Pan C H . 2019 . Adaptive brightness learning for active object recognition // Proceedings of 2019 IEEE International Conference on Acoustics， Speech and Signal Processing （ICASSP） . Brighton， UK ： IEEE： 2162 - 2166 ［ DOI： 10.1109/ICASSP.2019.8682681 http://dx.doi.org/10.1109/ICASSP.2019.8682681 ］

Xu N ， Huo C L ， Guo J C ， Liu Y W ， Wang J and Pan C H . 2021a . Adaptive remote sensing image attribute learning for active object detection // Proceedings of the 25th International Conference on Pattern Recognition （ICPR） . Milan， Italy ： IEEE： 111 - 118 ［ DOI： 10.1109/ICPR48806.2021.9412860 http://dx.doi.org/10.1109/ICPR48806.2021.9412860 ］

Xu N ， Huo C L ， Zhang X and Pan C H . 2022 . AHDet： a dynamic coarse-to-fine gaze strategy for active object detection . Neurocomputing ， 491 ： 522 - 532 ［ DOI： 10.1016/j.neucom.2021.12.030 http://dx.doi.org/10.1016/j.neucom.2021.12.030 ］

Xu N ， Huo C L ， Zhang X ， Cao Y ， Meng G F and Pan C H . 2021b . Dynamic camera configuration learning for high-confidence active object detection . Neurocomputing ， 466 ： 113 - 127 ［ DOI： 10.1016/j.neucom.2021.09.037 http://dx.doi.org/10.1016/j.neucom.2021.09.037 ］

Yan G W ， Zhou X J ， Jiao R H and He H . 2023 . Defect detection of tower bolts by fusion of priori information and feature constraints . Journal of Image and Graphics ， 28 （ 11 ）： 3497 - 3508

阎光伟，周香君，焦润海，何慧 . 2023 . 融合先验信息和特征约束的杆塔螺栓缺陷检测 . 中国图象图形学报， 28 （ 11 ）： 3497 - 3508 ［ DOI： 10.11834/jig.221077 http://dx.doi.org/10.11834/jig.221077 ］

Yang J W ， Ren Z L ， Xu M Z ， Chen X L ， Grandall D ， Parikh D and Batra D . 2019 . Embodied amodal recognition： learning to move to perceive objects // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV） . Seoul， Korea （South）： IEEE： 2040 - 2050 ［ DOI： 10.1109/ICCV.2019.00213 http://dx.doi.org/10.1109/ICCV.2019.00213 ］

Yang N ， Lu F ， Yu Y B ， Yao F J ， Zhang D Y and Tian G H . 2023 . Service robot active object detection based on spatial exploration using deep recurrent Q-learning network // Proceedings of 2023 IEEE International Conference on Robotics and Biomimetics （ROBIO） . Koh Samui， Thailand ： IEEE： 1 - 6 ［ DOI： 10.1109/ROBIO58561.2023.10354931 http://dx.doi.org/10.1109/ROBIO58561.2023.10354931 ］

Ye N ， Wang R G and Li N . 2021 . A novel active object detection network based on historical scenes and movements . International Journal of Computer Theory and Engineering ， 13 （ 3 ）： 79 - 83 ［ DOI： 10.7763/IJCTE.2021.V13.1293 http://dx.doi.org/10.7763/IJCTE.2021.V13.1293 ］

Ye Z F ， Hu T ， Lu M ， Ji J Z and Shen L Q . 2021 . Automatic bending degree measurement method of crimping cable based on image processing . Electrical Measurement and Instrumentation ， 58 （ 1 ）： 147 - 151

叶中飞，胡涛，卢明，纪鉴真，申立群 . 2021 . 基于图像处理的压接电缆弯曲度自动测量方法 . 电测与仪表， 58 （ 1 ）： 147 - 151 ［ DOI： 10.19753/j.issn1001-1390.2001.01.022 http://dx.doi.org/10.19753/j.issn1001-1390.2001.01.022 ］

Zaenker T ， Lehnert C ， McCool C and Bennewitz M . 2021 . Combining local and global viewpoint planning for fruit coverage // Proceedings of 2021 European Conference on Mobile Robots （ECMR） . Bonn， Germany ： IEEE： 1 - 7 ［ DOI： 10.1109/ECMR50962.2021.9568836 http://dx.doi.org/10.1109/ECMR50962.2021.9568836 ］

Zeng Z ， Röfer A and Jenkins O C . 2020 . Semantic linking maps for active visual object search // Proceedings of 2020 IEEE International Conference on Robotics and Automation （ICRA） . Paris， France ： IEEE： 1984 - 1990 ［ DOI： 10.1109/ICRA40945.2020.9196830 http://dx.doi.org/10.1109/ICRA40945.2020.9196830 ］

Zhang H ， Liu H P ， Guo D and Sun F C . 2017 . From foot to head： active face finding using deep Q-learning // Proceedings of 2017 IEEE International Conference on Image Processing （ICIP） . Beijing， China ： IEEE： 1862 - 1866 ［ DOI： 10.1109/ICIP.2017.8296604 http://dx.doi.org/10.1109/ICIP.2017.8296604 ］

Zhang H . 2017 . Active Face Perception Based on Deep Reinforcement Learning . Jinan ： Shandong University

张辉 . 2017 . 基于深度强化学习的主动人脸感知技术研究 . 济南：山东大学

Zhang P Y ， Wang D ， Lu H C and Yang X Y . 2021 . Learning adaptive attribute-driven representation for real-time RGB-T tracking . International Journal of Computer Vision ， 129 （ 9 ）： 2714 - 2729 ［ DOI： 10.1007/s11263-021-01495-3 http://dx.doi.org/10.1007/s11263-021-01495-3 ］

Zhang S F ， Chi C ， Yao Y Q ， Lei Z and Li S Z . 2020 . Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 9756 - 9765 ［ DOI： 10.1109/CVPR42600.2020.00978 http://dx.doi.org/10.1109/CVPR42600.2020.00978 ］

Zhang Y ， Tian G H ， Shao X Y ， Zhang M Y and Liu S P . 2023 . Semantic grounding for long-term autonomy of mobile robots toward dynamic object search in home environments . IEEE Transactions on Industrial Electronics ， 70 （ 2 ）： 1655 - 1665 ［ DOI： 10.1109/TIE.2022.3159913 http://dx.doi.org/10.1109/TIE.2022.3159913 ］

Zhao Z P ， Hao K ， Liu X F ， Zheng T C ， Xu J J ， Cui S Y ， He C ， Zhou J and Zhao G M . 2023 . MCANet： hierarchical cross-fusion lightweight transformer based on multi-ConvHead attention for object detection . Image and Vision Computing ， 136 ： # 104715 ［ DOI： 10.1016/j.imavis.2023.104715 http://dx.doi.org/10.1016/j.imavis.2023.104715 ］

Zhong T . 2018 . Vision-Based Pedestrian Tracking and Facial Active Perception for Unmanned Aerial Vehicles . Shanghai ： Shanghai Jiao Tong University

钟韬 . 2018 . 基于视觉的无人机行人跟踪与人脸主动感知 . 上海：上海交通大学［ DOI： 10.27307/d.cnki.gsjtu.2018.001397 http://dx.doi.org/10.27307/d.cnki.gsjtu.2018.001397 ］

Zhou T ， Ye X Y ， Zhao Y N ， Lu H L and Liu F Z . 2024 . Cross-modal attention YOLOv5 PET/CT lung cancer detection . Journal of Image and Graphics ， 29 （ 4 ）： 1070 - 1084

周涛，叶鑫宇，赵雅楠，陆惠玲，刘凤珍 . 2024 . 跨模态注意力YOLOv5的PET/CT肺部肿瘤检测 . 中国图象图形学报， 29 （ 4 ）： 1070 - 1084 ［ DOI： 10.11834/jig.230169 http://dx.doi.org/10.11834/jig.230169 ］

Zhu M Z ， Zhao B L and Kong T . 2022 . Navigating to objects in unseen environments by distance prediction // Proceedings of 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS） . Kyoto， Japan ： IEEE： 10571 - 10578 ［ DOI： 10.1109/IROS47612.2022.9981766 http://dx.doi.org/10.1109/IROS47612.2022.9981766 ］

Zhu Y K ， Gordon D ， Kolve E ， Fox D ， Li F F ， Gupta A ， Mottaghi R and Farhadi A . 2017 . Visual semantic planning using deep successor representations // Proceedings of 2017 IEEE International Conference on Computer Vision （ICCV） . Venice， Italy ： IEEE： 483 - 492 ［ DOI： 10.1109/ICCV.2017.60 http://dx.doi.org/10.1109/ICCV.2017.60 ］

Zou Z X ， Chen K Y ， Shi Z W ， Guo Y H and Ye J P . 2023 . Object detection in 20 years： a survey . Proceedings of the IEEE ， 111 （ 3 ）： 257 - 276 ［ DOI： 10.1109/JPROC.2023.3238524 http://dx.doi.org/10.1109/JPROC.2023.3238524 ］

文章被引用时，请邮件提醒。

提交

主动目标几何建模研究方法综述

可变形卷积与注意力的SAR舰船检测轻量化模型

局部无参注意力和联合损失的遥感目标检测

融合边界框高斯建模与特征聚合分发的遥感飞机细粒度识别