面向目标检测的视点规划方法综述
Methods of view planning of object detection: a survey
- 2024年 页码:1-19
网络出版日期: 2024-09-03
DOI: 10.11834/jig.240319
移动端阅览
浏览全部资源
扫码关注微信
网络出版日期: 2024-09-03 ,
移动端阅览
王健宇,朱枫,郝颖明等.面向目标检测的视点规划方法综述[J].中国图象图形学报,
Wang Jianyu,Zhu Feng,Hao Yingming,et al.Methods of view planning of object detection: a survey[J].Journal of Image and Graphics,
目标检测是计算机视觉领域的基础研究方向之一。由于图像采集时物体摆放密集,光照条件差等因素会导致图像失去细节,当使用此类图像作为输入时,常规的目标检测算法对目标物的检测结果无法满足任务需求。为了解决这类问题,面向目标检测的视点规划这一智能感知方法应运而生,其可自主分析当前条件下影响检测任务的因素,调整相机的位姿参数规避影响,实现目标物准确检测。 面向目标检测的视点规划方法不仅可以辅助计算机视觉的其他领域,也会为未来的智能化生活提供便利。为了反映其研究现状和最新进展,本文梳理了2007年以来的文献,对国内外的研究方法做出概括性总结。
首先
,以算法应用的场景维度和调整参数作为分类依据,我们将面向目标检测的视点规划方法分为二维像素调整的规划方法,三维空间移动的规划方法以及两者结合的规划方法三类,本文重点对前两类方法进行分析与总结。
其次
,解析每类方法的基本思想,并指出各类方法需解决的关键问题,然后对解决问题的主要研究方法进行归纳和分析,并总结各自的优点和局限性。除此之外,本文也对各类场景下可使用的数据集和评价指标进行简要介绍。
最后
,在目前方法的分析基础上,探讨面向目标检测的视点规划领域所面临的挑战,并对未来研究方法进行展望。
Object detection is one of the fundamental research directions in the field of computer vision. It is also the cornerstone of advanced vision research. When objects are placed densely or under poor lighting conditions, much detail is lost in image acquisition. Using the image with missing details as input, the detection results of the conventional target detection algorithm can not meet the task requirements. To solve such problems, intelligent perceptual methods for point-of-view planning for target detection have emerged, which can autonomously analyze the factors affecting the detection task under the current conditions, adjust the camera's pose parameters to avoid the effects, and achieve accurate detection of targets. To reflect the research status and the latest development of viewpoint planning methods for object detection, relevant studies since 2007 are combed and analyzed, and a summary of domestic and foreign research methods is made. To simplify, this method is called active object detection (AOD) in this article. According to the different use scenarios, this paper divides the active object detection methods into two categories: AOD in two-dimensional scenes, AOD in three-dimensional scenes, and AOD combining the two. Since the third method is uncommon, this paper mainly introduces the first two methods. To be more specific, in two-dimensional scenes, AOD methods are divided into pixel-based, and that simulates camera parameters, depending on whether a single-pixel or an overall image is planned. The most important part of the pixel-based approach is selecting the target pixel point and how the next pixel is planned. Typically, integral features, scale features, or key points, which are the parts of the target that have the largest gap between the target and the background, are used by researchers to locate where the target pixels are likely to be. After positioning the target pixel, to ensure the continuity of the front and back frames and avoid the task failure caused by planning errors, the moving position of the next pixel will be set according to the category of the region. For AOD methods that simulate camera parameters, different influencing factors cause various difficulties in target detection. As a result, researchers have designed different planning scenarios by analyzing the types of influencing factors, and some excellent results have emerged in recent years. As time goes by, the popularity of moveable robots has brought AOD into a new development environment-3D scenes. In a three-dimensional scene, the AOD method controls the intelligent agent to actively select the next viewpoint pose in space to remove the influence of interference factors on the target detection process. According to the degree of known spatial location information within 3D scenes, we classify them into 3D scenes with known spatial relationships and 3D scenes with unknown spatial relationships. In the first type of scenario, the placement of the target object and surrounding objects, the display of spatial category markers, and the range of viewpoint planning are all known, and the AOD method can perform viewpoint planning based on the known information. In this type of approach, researchers focus more on the representation of relationships and the selection of the next viewpoint in a fixed search space. The second type of space has no information to assist, and the agent can only rely on the observation results to select the next viewpoint. As is well known, in real life, situations where relationships are unknown are more common, so the design of AOD methods in this situation is currently a hot direction. Due to the close relationship between the planning strategy of such scenarios and the observed results, researchers have made a lot of effort to provide detailed descriptions. In AOD, we usually refer to observation information as state expression, and the more detailed the expression, the better the strategy generation. In addition, to evaluate the next viewpoint and modify the planning strategy, researchers have also made many efforts in the evaluation function of the next view. AOD has two main objectives in unknown environments: path optimization and detection effect optimization. The evaluation function is generally divided into single-element evaluation and multi-element evaluation based on the types of evaluation factors. Although multi-element evaluation is more accurate, the selection of elements in different problems must be more consistent. Finding the same component in various scenarios to design a universal evaluation function is still a direction researchers can break through in the future. In addition to the analysis of the methods mentioned above, this article also provides a brief introduction to the datasets that AOD methods can use in different types of scenarios. The viewpoint planning in two-dimensional scenes is consistent with the scenes used by conventional object detection methods, so there are also many overlaps on the dataset, such as large-scale public datasets CoCo, Pascal VOC, etc. Meanwhile, the evaluation indicators of the two methods are also basically the same, so performance comparison can be directly conducted. Due to the consideration of motion factors, it is not possible to directly compare detection results on 3D datasets such as AVD and T-LESS to determine the correctness of the movement path. Therefore, researchers have designed task success rate(SR) and average travel distance as the leading indicators to measure the effectiveness of the AOD algorithm. It should be emphasized that although many excellent results have been achieved in viewpoint planning methods oriented toward target detection, there are still parts that can be improved regarding scene design and research methodology. First, some real physical elements can be added to the scene design to transform the planning problem into an optimization problem under certain constraints. Secondly, the methods suitable for two-dimensional and three-dimensional scenes are closely combined, and further accurate detection can be achieved by changing the sensor parameters in inaccessible areas. Finally, detection-oriented viewpoint planning methods usually output discrete actions and are also tightly bound to the task, so viewpoint planning in continuous environments or establishing a generic framework for task-independent viewpoint planning can also be considered future directions.
目标检测主动视觉参数调整视点规划智能感知
object detectionactive visionparameter adjustmentview planningintelligent perception
Aloimonos, J, Weiss, I and Bandyopadhyay A. 1988. Active Vision. International Journal of Computer Vision, 1:333-356 [DOI: 10.1007/BF00133571http://dx.doi.org/10.1007/BF00133571]
Amiri S, Chandan K and Zhang S Q. 2022. Reasoning With Scene Graphs for Robot Planning Under Partial Observability. IEEE Robotics and Automation Letters, 7(2):5560-5567 [DOI: 10.1109/LRA.2022.3157567http://dx.doi.org/10.1109/LRA.2022.3157567]
Anderson P, Wu Q, Teney D, Bruce J, Johnson M, Sünderhauf N, Reid I, Gould S and Hengel A. 2018.Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 3674-3683 [DOI: 10.1109/CVPR.2018.00387http://dx.doi.org/10.1109/CVPR.2018.00387]
Bian K, Qiao T Z, Yang Y and Zhang H T.2021.Robot visual face search and tracking method based on reinforcement learning.Electronic Measurement Technology, 44(08): 82-86
卞凯, 乔铁柱, 杨毅, 张海涛.2021.基于强化学习的机器人人脸搜索和跟踪方法.电子测量技术, 44(08): 82-86 [DOI: 10.19651/j.cnki.emt.2106361http://dx.doi.org/10.19651/j.cnki.emt.2106361]
Burak. U and Stefano E. 2020.Learning When and Where to Zoom With Deep Reinforcement Learning //2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE: 12342-12351 [DOI: 10.1109/CVPR42600.2020.01236http://dx.doi.org/10.1109/CVPR42600.2020.01236]
Chang A, Dai A, Funkhouser T, Halber M, Nießner M, Savva M, Song S R, Zeng A and Zhang Y D. 2017.Matterport3D: Learning from RGB-D Data in Indoor Environments// International Conference on 3D Vision (3DV). Qingdao: IEEE: 667-676[DOI: 10.1109/3DV.2017.00081http://dx.doi.org/10.1109/3DV.2017.00081]
Chen S Y, Li Y F and Kwok N M. 201. Active vision in robotic systems: A survey of recent developments. The International Journal of Robotics Research, 30(11):1343-1377 [DOI: 10.1177/0278364911410755http://dx.doi.org/10.1177/0278364911410755]
Christie G, Fendley N, Wilson J and Mukherjee R. 2018.Functional Map of the World //2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 6172-6180 [DOI: 10.1109/CVPR.2018.00646http://dx.doi.org/10.1109/CVPR.2018.00646]
Connolly C. 1985. The determination of next best views// Proceedings.1985 IEEE International Conference on Robotics and Automation. St.Louis: IEEE: 432-435 [DOI: 10.1109/ROBOT.1985.1087372http://dx.doi.org/10.1109/ROBOT.1985.1087372]
Croon G D. 2007. Active object detection. //In Proc. of the 2nd International Conference on Computer Vision Theory and Applications. United States: SciTePress: 97-103 [DOI: 10.5220/0002044600970103http://dx.doi.org/10.5220/0002044600970103]
de Croon G, Postma E O and van den Herik H J. 2005. Sensory-Motor Coordination in Gaze Control. // Applications of Evolutionary Computing. EvoWorkshops 2005. Heidelberg: Springer: 334-344[DOI: 10.1007/978-3-540-32003-6_34http://dx.doi.org/10.1007/978-3-540-32003-6_34]
Duan K., Bai S., Xie L., Qi H., Huang Q. and Tian Q., 2019. Centernet: Keypoint triplets for object detection.//In Proceedings of the IEEE/CVF international conference on computer vision. Seoul: IEEE: 6569-6578 [DOI: 10.1109/ICCV.2019.00667http://dx.doi.org/10.1109/ICCV.2019.00667]
Ecins A, Fermüller C and Aloimonos Y. 2016. Cluttered scene segmentation using the symmetry constraint// 2016 IEEE International Conference on Robotics and Automation (ICRA). Stockholm: IEEE:2271-2278 [DOI:10.1109/ICRA.2016.7487376http://dx.doi.org/10.1109/ICRA.2016.7487376]
Eidenberger R, Grundmann T, Feiten W and Zoellner R. 2008.Fast parametric viewpoint estimation for active object detection// 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems. Seoul: IEEE: 309-314 [DOI: 10.1109/MFI.2008.4648083http://dx.doi.org/10.1109/MFI.2008.4648083]
Eidenberger R, Zoellner R and Scharinger J. 2009.Probabilistic occlusion estimation in cluttered environments for active perception planning.//2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics. Singapore: IEEE: 1248-1253 [DOI: 10.1109/AIM.2009.5229779http://dx.doi.org/10.1109/AIM.2009.5229779]
Eidinger E, Enbar R and Hassner T. 2014. Age and Gender Estimation of Unfiltered Faces. IEEE Transactions on Information Forensics and Security, 9(12): 2170-2179[DOI: 10.1109/TIFS.2014.2359646http://dx.doi.org/10.1109/TIFS.2014.2359646]
Everingham M, Van Gool L, Williams C K I, Winn J and Zisserman A. 2010.The PASCAL Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, 88: 303–338 [DOI:10.1007/s11263-009-0275-4http://dx.doi.org/10.1007/s11263-009-0275-4]
Fu H, Cai B W, Gao L, Zhang, L X, Wang, J M, Li C, Zeng Q X, Sun C Y, Jia R F, Zhao B Q and Zhang . 2021. 3D-FRONT: 3D Furnished Rooms with layOuts and semaNTics//IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE: 10913-10922 [DOI: 10.1109/ICCV48922.2021.01075http://dx.doi.org/10.1109/ICCV48922.2021.01075]
Gao C, Chen J, Liu S, Wang L, Zhang Q and Wu Q. 2021.Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE: 3063-3072 [DOI: 10.1109/CVPR46437.2021.00308http://dx.doi.org/10.1109/CVPR46437.2021.00308]
Gao M F, Yu R C, Li A, Morariu V I and Davis L S. 2018.Dynamic Zoom-in Network for Fast Object Detection in Large Images// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 6926-6935 [DOI: 10.1109/CVPR.2018.00724http://dx.doi.org/10.1109/CVPR.2018.00724]
Han X N, Liu H P, Sun F C and Yang D F. 2018.Active Object Detection Using Double DQN and Prioritized Experience Replay//International Joint Conference on Neural Networks (IJCNN). Brazil: IEEE: 1-7 [DOI: 10.1109/IJCNN.2018.8489296http://dx.doi.org/10.1109/IJCNN.2018.8489296]
Han X N, Liu H P, Sun F C and Zhang X Y. 2019. Active Object Detection With Multistep Action Prediction Using Deep Q-Network. IEEE Transactions on Industrial Informatics,15(6): 3723-3731 [DOI: 10.1109/TII.2019.2890849http://dx.doi.org/10.1109/TII.2019.2890849]
Hasselt V, Guez A and Silver D. 2016. Deep Reinforcement Learning with Double Q-Learning//Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix: AAAI: 30(1) 2094-2100 [DOI:10.1609/aaai.v30i1.10295http://dx.doi.org/10.1609/aaai.v30i1.10295]
Hodan T, Haluza P, Obdržálek Š, Matas J, Lourakis M and Zabulis X. 2017.T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects//IEEE Winter Conference on Applications of Computer Vision (WACV). Santa Rosa: IEEE: 880-888[DOI: 10.1109/WACV.2017.103http://dx.doi.org/10.1109/WACV.2017.103]
J. . -S. R. Jang. 1993.ANFIS: adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man, and Cybernetics,23(3):665-685 [DOI: 10.1109/21.256541http://dx.doi.org/10.1109/21.256541]
Jiang L Y, Cai T, Ma Q X, Xu F J and Wang S J. 2020.Active Object Detection in Sonar Images. IEEE Access, 8:102540-102553[DOI:10.1109/ACCESS.2020.2999341http://dx.doi.org/10.1109/ACCESS.2020.2999341]
Koyun O C, Keser R K, Akkaya İ B and Töreyin B U. 2022. Focus-and-Detect: A small object detection framework for aerial images. Signal Processing: Image Communication, 104: 116675 [DOI: 10.1016/j.image.2022.116675http://dx.doi.org/10.1016/j.image.2022.116675]
L. Itti, C. Koch and E. Niebur. 1998.A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence,20(11): 1254-1259 [DOI: 10.1109/34.730558http://dx.doi.org/10.1109/34.730558]
Law H and Deng J. 2019.Cornernet: Detecting objects as paired keypoints. International Journal of Computer Vision, 128: 642-656 [DOI: 10.1007/s11263-019-01204-1http://dx.doi.org/10.1007/s11263-019-01204-1]
Li C L, Cheng H, Hu S Y, Liu X B, Tang J and Lin L. 2016. Learning Collaborative Sparse Representation for Grayscale-Thermal Tracking. IEEE Transactions on Image Processing,25(12):5743-5756[DOI:10.1109/TIP.2016.2614135http://dx.doi.org/10.1109/TIP.2016.2614135]
Li C L, Liang X Y, Lu Y J, Zhao N and Tang J. 2019. RGB-T object tracking: Benchmark and baselines. Pattern Recognition,96: 106977 [DOI: 10.1016/j.patcog.2019.106977http://dx.doi.org/10.1016/j.patcog.2019.106977]
Li J L, Xu R S, Ma J, Zou Q, Ma J Q and Yu H K. 2023. Domain Adaptive Object Detection for Autonomous Driving Under Foggy Weather// IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa: IEEE: 612-622 [DOI: 10.1109/WACV56688.2023.00068http://dx.doi.org/10.1109/WACV56688.2023.00068]
Li W B, Saeedi S, McCormac J, Clark R, Tzoumanikas D, Ye Q, Huang Y Z, Tang R and Leutenegger S. 2018. InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset. [EB/OL].[2024-06-03]. https://arxiv.org/pdf/1809.00716https://arxiv.org/pdf/1809.00716
Liang W Y, Fang Fen, Acar C H, Toh W Q, Sun Y, Xu Q L and Wu Y. 2022. Visuo-Tactile Manipulation Planning Using Reinforcement Learning with Affordance Representation. [EB/OL].[2024-06-03]. https://arxiv.org/pdf/2207.06608https://arxiv.org/pdf/2207.06608
Liao H-C, P. Chen P-Y, Lin Z-Y and Lim Z-Y. 2017. Automatic zooming mechanism for capturing clear moving object image using high definition fixed camera. //2017 19th International Conference on Advanced Communication Technology (ICACT). PyeongChang: IEEE: 869-876 [DOI:10.23919/ICACT.2017.7890238http://dx.doi.org/10.23919/ICACT.2017.7890238]
Lin T Y, Maire M, Belongie S, Hayes J, Perona P, Ramanan D, Dollár P and Zitnik C L. 2014. Microsoft COCO: Common Objects in Context// Computer Vision – ECCV 2014. Zürich: Springer, Cham: 8693(740-755) [DOI: 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48]
Lin T-Y, Goyal P, Girshick R, He K M and Dollár P. 2020. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence,42(2):318-327[DOI: 10.1109/TPAMI.2018.2858826http://dx.doi.org/10.1109/TPAMI.2018.2858826]
Liu H P, Wu Y P and Sun F C.2018.Extreme trust region policy optimization for active object recognition.IEEE Transactions on Neural Networks and Learning Systems, 29(06):2253-2258 [DOI: 10.1109/TNNLS.2017.2785233http://dx.doi.org/10.1109/TNNLS.2017.2785233]
Liu L, Li C L, Xiao Y, Ruan R, Fan M H. 2024. RGBT Tracking via Challenge-Based Appearance Disentanglement and Interaction. IEEE Transactions on Image Processing,33: 1753-1767[DOI: 10.1109/TIP.2024.3371355http://dx.doi.org/10.1109/TIP.2024.3371355]
Liu S P, Tian G H, Zhang Y, Zhang M Y and Liu S. 2022. Active Object Detection Based on a Novel Deep Q-Learning Network and Long-Term Learning Strategy for the Service Robot. IEEE Transactions on Industrial Electronics,69(6): 5984-5993 [DOI: 10.1109/TIE.2021.3090707http://dx.doi.org/10.1109/TIE.2021.3090707]
Liu S P, Tian G H, Zhang Ying, Zhang M Y and Liu S. 2022. Service planning oriented efficient object search: A knowledge-based framework for home service robot. I Expert Systems with Applications,187:115853 [DOI: 10.1016/j.eswa.2021.115853http://dx.doi.org/10.1016/j.eswa.2021.115853]
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: Single Shot MultiBox Detector//Computer Vision – ECCV 2016. Amsterdam: Springer: 9905 [DOI:10.1007/978-3-319-46448-0_2]
Liu X Y, Iwase S and Kitani K M. 2021. StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation//IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE: 10850-10859 [DOI: 10.1109/ICCV48922.2021.01069http://dx.doi.org/10.1109/ICCV48922.2021.01069]
Liu Z M., Wang J R, Li J, Liu P D and Ren K. 2023. A novel multiple targets detection method for service robots in the indoor complex scenes. Intel Serv Robotics,16: 453–46. [DOI:10.1007/s11370-023-00471-9http://dx.doi.org/10.1007/s11370-023-00471-9]
Lorbach M, Höfer S and Brock O. 2014.Prior-assisted propagation of spatial information for object search//IEEE/RSJ International Conference on Intelligent Robots and Systems. Chicago: IEEE: 2904-2909 [DOI: 10.1109/IROS.2014.6942962http://dx.doi.org/10.1109/IROS.2014.6942962]
Mishra A, Aloimonos Y and Fah C L. 2009. Active segmentation with fixation// 2009 IEEE 12th International Conference on Computer Vision. Kyoto: IEEE: 468-475 [DOI: 10.1109/ICCV.2009.5459254http://dx.doi.org/10.1109/ICCV.2009.5459254]
Mittal S, Karthik M S, Kumar S and Krishna K M. 2014.Small Object Discovery and Recognition Using Actively Guided Robot//22nd International Conference on Pattern Recognition. Stockholm: IEEE: 4334-4339 [DOI: 10.1109/ICPR.2014.742http://dx.doi.org/10.1109/ICPR.2014.742]
Morrison D, Corke P and Leitner J. 2019.Multi-View Picking: Next-best-view Reaching for Improved Grasping in Clutter//2019 International Conference on Robotics and Automation (ICRA). Montreal: IEEE: 8762-8768 [DOI: 10.1109/ICRA.2019.8793805http://dx.doi.org/10.1109/ICRA.2019.8793805]
Murphy K, Torralba A, Eaton D and Freeman W. 2006. Object Detection and Localization Using Local and Global Features // Toward Category-Level Object Recognition. Heidelberg: Springer,: 382-400 [DOI: 10.1007/11957959_20http://dx.doi.org/10.1007/11957959_20]
Nikolay A, Bharath S, Jerome L N, George J P and Kostas D. 2013. Nonmyopic View Planning for Active Object Detection. [EB/OL].[2024-06-03]. https://arxiv.org/pdf/1309.5401https://arxiv.org/pdf/1309.5401
Nuo X, Huo C, Zhang X, Cao Y, Meng G F, and Pan C H. 2021. Dynamic camera configuration learning for high-confidence active object detection. Neurocomputing 466: 113-127 [DOI: 10.1016/j.neucom.2021.09.037http://dx.doi.org/10.1016/j.neucom.2021.09.037]
Pan X Q, Charron N, Yang Y Q, Peters S, Whelan T, Kong C, Parkhi O, Newcombe R, Ren Y H. 2023. Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine Perception//IEEE/CVF International Conference on Computer Vision (ICCV). Paris: IEEE: 20076-20086 [DOI: 10.1109/ICCV51070.2023.01842http://dx.doi.org/10.1109/ICCV51070.2023.01842]
Pirinen A and Sminchisescu C. 2018. Deep Reinforcement Learning of Region Proposal Networks for Object Detection// IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE: 6945-6954[DOI: 10.1109/CVPR.2018.00726http://dx.doi.org/10.1109/CVPR.2018.00726]
Qi Y K, Wu Q, Anderson P, Wang X, Wang Y W, Shen C H, Hengel A. 2020. REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE: 9979-9988[DOI: 10.1109/CVPR42600.2020.01000http://dx.doi.org/10.1109/CVPR42600.2020.01000]
Qi Y K, Zhang S P, Zhang W G, Li S, Huang Q M, and Yang M-H. 2019.Learning attribute-specific representations for visual tracking// Proceedings of the AAAI conference on artificial intelligence. Hawaii State: AAAI: 33(1) 8835-8842 [DOI: 10.1609/aaai.v33i01.33018835http://dx.doi.org/10.1609/aaai.v33i01.33018835]
Qian X Q, Liu W F, Zhang J and Cao Y. 2022. Underwater-relevant image object detection based feature-degraded enhancement method. Journal of Image and Graphics,27(11): 3185-3198
钱晓琪,刘伟峰,张敬,曹洋. 2023. 面向水下图像目标检测的退化特征增强算法) [DOI: 10.11834/jig.210415http://dx.doi.org/10.11834/jig.210415]
Rasouli A and Tsotsos J K. 2014. Visual Saliency Improves Autonomous Visual Search// 2014 Canadian Conference on Computer and Robot Vision. Montreal: IEEE: 111-118 [DOI: 10.1109/CRV.2014.23http://dx.doi.org/10.1109/CRV.2014.23]
Redmon J and Farhadi A. 2018. YOLOv3: An Incremental Improvement.[EB/OL].[2024-06-03]. https://arxiv.org/pdf/1804.02767https://arxiv.org/pdf/1804.02767
Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence,39(6):1137-1149[DOI:10.1109/TPAMI.2016.2577031http://dx.doi.org/10.1109/TPAMI.2016.2577031]
Ren, Z H, Fang, F Z, Yan, N and Wu Y. 2022. State of the Art in Defect Detection Based on Machine Vision. Int. J. of Precis. Eng. and Manuf.-Green Tech, 9:661–691 [DOI: 10.1007/s40684-021-00343-6http://dx.doi.org/10.1007/s40684-021-00343-6]
Roy, A M, Bose, R and Bhaduri, J. 2022. A fast accurate fine-grain object detection model based on YOLOv4 deep neural network. Neural Computing and Application 34: 3895–3921 [DOI: 10.1007/s00521-021-06651-xhttp://dx.doi.org/10.1007/s00521-021-06651-x]
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C and Li F F. 2015.ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision,115: 211–252 [DOI: 10.1007/s11263-015-0816-yhttp://dx.doi.org/10.1007/s11263-015-0816-y]
Ruzena B, Yiannis A and John K T. 2018. Revisiting active perception. Autonomous Robots, 42: 177-196 [DOI: 10.1007/s10514-017-9615-3http://dx.doi.org/10.1007/s10514-017-9615-3]
Sakaridi C, Dai D X and Gool L V. 2018. Semantic Foggy Scene Understanding with Synthetic Data. International Journal of ComputerVision,126:973-992[DOI:10.1007/s11263-018-1072-8http://dx.doi.org/10.1007/s11263-018-1072-8]
Santana, Pedro, Nelson Alves, Luís Correia, and José Barata. 2010. A saliency-based approach to boost trail detection.// In 2010 IEEE International Conference on Robotics and Automation. Anchorage: IEEE: 1426-1431 [DOI:10.1109/ROBOT.2010.5509929http://dx.doi.org/10.1109/ROBOT.2010.5509929]
Sanyam J. 2023. DeepSeaNet: Improving Underwater Object Detection using EfficientDet [EB/OL].[2024-06-03]. https://arxiv.org/pdf/2306.06075.pdfhttps://arxiv.org/pdf/2306.06075.pdf
Savva M, Chang A X, Dosovitskiy A, Funkhouser T and Koltun V. 2017. MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments. [EB/OL].[2024-06-03]. https://arxiv.org/pdf/1712.03931https://arxiv.org/pdf/1712.03931
Schmid F J, Lauri M and Frintrop S. 2019. Explore, Approach, and Terminate: Evaluating Subtasks in Active Visual Object Search Based on Deep Reinforcement Learning//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Macau: IEEE: 5008-5013 [DOI: 10.1109/IROS40897.2019.8967805http://dx.doi.org/10.1109/IROS40897.2019.8967805]
Scott W R, Roth G and Rivest J F. 2003. View planning for automated three-dimensional object reconstruction and inspection.ACM Computing Surveys, 35(1): 64-96 [DOI: 1 10.1145/641865.641868http://dx.doi.org/110.1145/641865.641868]
Shen L D, Huo C L, Xu N, Han C W and Wang Z C. 2024. Learn How to See: Collaborative Embodied Learning for Object Detection and Camera Adjusting//Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver: AAAI: 38(5), 4793-4801 [DOI:10.1609/aaai.v38i5.28281http://dx.doi.org/10.1609/aaai.v38i5.28281]
Shim V A, Yuan M L and Tan B H. 2017.Automatic object searching by a mobile robot with single RGB-D camera//2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Kuala Lumpur: IEEE: 056-062 [ DOI: 10.1109/APSIPA.2017.8282002http://dx.doi.org/10.1109/APSIPA.2017.8282002]
Shivalila H, Tripty S and Neelima N. 2023. Face Detection and Recognition Using Face Mesh and Deep Neural Network. Procedia Computer Science,218:741-749 [DOI: 10.1016/j.procs.2023.01.054http://dx.doi.org/10.1016/j.procs.2023.01.054]
Srishti S, Sarthak N and Sparsh M. 2021. A survey of deep learning techniques for vehicle detection from UAV images. Journal of Systems Architecture, 117: 102152 [DOI: 10.1016/j.sysarc.2021.102152http://dx.doi.org/10.1016/j.sysarc.2021.102152]
Tan H, Yu L C and Bansal M. 2019. Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout. [EB/OL].[2024-06-03]. https://arxiv.org/pdf/1904.04195https://arxiv.org/pdf/1904.04195
Viola P, Jones M J. 2004. Robust Real-Time Face Detection. International Journal of Computer Vision 57: 137-154 [DOI: 10.1023/B:VISI.0000013087.49260.fbhttp://dx.doi.org/10.1023/B:VISI.0000013087.49260.fb]
Wang Z, Yin Y X, Shi J P, Fang W, Li H S and Wang, X G. 2017. Zoom-in-Net: Deep Mining Lesions for Diabetic Retinopathy Detection// Medical Image Computing and Computer Assisted Intervention - MICCAI 2017. Quebec City: Springer Cham: 267-275 [DOI: 10.1007/978-3-319-66179-7_3http://dx.doi.org/10.1007/978-3-319-66179-7_3]
Wu Y, Wu Y X, Gkioxari G and Tian Y D. 2018. Building Generalizable Agents with a Realistic and Rich 3D Environment.[EB/OL].[2024-06-03]. https://arxiv.org/pdf/1801.02209https://arxiv.org/pdf/1801.02209
Xiang T G, Zhang Y X, Lu Y Y, Alan L Y, Zhang C Y, Cai W D and Zhou Z W. 2023. Squid: Deep feature in-painting for unsupervised anomaly detection//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE: 23890-23901 [DOI: 10.1109/cvpr52729.2023.02288http://dx.doi.org/10.1109/cvpr52729.2023.02288]
Xu J T, Li Y L, Wang S J. 2021. AdaZoom: Adaptive Zoom Network for Multi-Scale Object Detection in Large Scenes [EB/OL].[2024-06-03]. https://arxiv.org/pdf/2106.10409https://arxiv.org/pdf/2106.10409
Xu N, Huo C L, Guo J C, Liu Y W, Wang J and Pan C H. 2021. Adaptive Remote Sensing Image Attribute Learning for Active Object Detection//International Conference on Pattern Recognition (ICPR). Milan: IEEE: 111-118[DOI: 10.1109/ICPR48806.2021.9412860http://dx.doi.org/10.1109/ICPR48806.2021.9412860]
Xu N, Huo C L, Zhang X and Pan C H. 2022. AHDet: A dynamic coarse-to-fine gaze strategy for active object detection. Neurocomputing,491: 522-532 [DOI: 10.1016/j.neucom.2021.12.030http://dx.doi.org/10.1016/j.neucom.2021.12.030]
Y. Wu, J. Lim and M. -H. Yang. 2015. Object Tracking Benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence,37(9):1834-1848 [DOI:10.1109/TPAMI.2014.2388226http://dx.doi.org/10.1109/TPAMI.2014.2388226]
Yan G W, Zhou X J, Jiao R H and He H. 2023. Defect detection of tower bolts by fusion of prior information and feature constraints. Journal of Image and Graphics, 28(11): 3497-35808
阎光伟,周香君,焦润海,何慧. 2023. 融合先验信息和特征约束的杆塔螺栓缺陷检测) [DOI: 10.11834/jig.221077http://dx.doi.org/10.11834/jig.221077]
Yang J W, Ren Z L, Xu M Z, Chen X L, Grandall D, Parikh D and Batra D. 2019.Embodied Amodal Recognition: Learning to Move to Perceive Objects//IEEE/CVF International Conference on Computer Vision (ICCV). Seoul: IEEE: 2040-2050 [DOI: 10.1109/ICCV.2019.00213http://dx.doi.org/10.1109/ICCV.2019.00213]
Yang N, Lu F, Yu Y B, Yao F J, Zhang D Y and Tian G H. 2023. Service Robot Active Object Detection based on Spatial Exploration using Deep Recurrent Q-learning Network.//IEEE International Conference on Robotics and Biomimetics (ROBIO). Koh: IEEE: 1-6 [DOI: 10.1109/ROBIO58561.2023.10354931http://dx.doi.org/10.1109/ROBIO58561.2023.10354931]
Ye N, Wang R G and Li N. 2021. A Novel Active Object Detection Network Based on Historical Scenes and Movements. International Journal of Computer Theory and Engineering,13(3): 79-83 [DOI: 10.7763/IJCTE.2021.V13.1293http://dx.doi.org/10.7763/IJCTE.2021.V13.1293]
Ye Z F, Hu T, Lu M, Ji J Z and Shen L Q. 2021. Automatic bending degree measurement method of crimping cable based on image processing. Electrical Measurement & Instrumentation,58(1): 147-151
叶中飞, 胡涛, 卢明, 纪鉴真, 申立群. 2021. 基于图像处理的压接电缆弯曲度自动测量方法. 电 测 与 仪 表, 58(1): 147-151 [DOI: 10.19753/j.issn1001-1390.2001.01.022http://dx.doi.org/10.19753/j.issn1001-1390.2001.01.022]
Zaenker T, Lehnert C, McCool C and Bennewitz M. 2021.Combining Local and Global Viewpoint Planning for Fruit Coverage//2021 European Conference on Mobile Robots (ECMR). Bonn: IEEE: 1-7 [DOI: 10.1109/ECMR50962.2021.9568836http://dx.doi.org/10.1109/ECMR50962.2021.9568836]
Zeng Z, Röfer A and Jenkins O C. 2020.Semantic Linking Maps for Active Visual Object Search//2020 IEEE International Conference on Robotics and Automation (ICRA). Paris: IEEE: 1984-1990 [DOI: 10.1109/ICRA40945.2020.9196830http://dx.doi.org/10.1109/ICRA40945.2020.9196830]
Zhang H, Liu H P, Guo D and Sun F C. 2017. From foot to head: Active face finding using deep Q-learning//IEEE International Conference on Image Processing (ICIP). Beijing: IEEE: 1862-1866[DOI: 10.1109/ICIP.2017.8296604http://dx.doi.org/10.1109/ICIP.2017.8296604]
Zhang H. 2017. Active Face Perception based on Deep Reinforcement Learning. Shandong: Shandong University
张辉. 2017. 基于深度强化学习的主动人脸感知技术研究. 山东:山东大学
Zhang P Y, Wang D, Lu H C and Yang X Y. 2021. Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking. International Journal of Computer Vision,129: 2714–2729 [DOI: 10.1007/s11263-021-01495-3http://dx.doi.org/10.1007/s11263-021-01495-3]
Zhang S F, Chi C, Yao Y Q, Lei Z and Li S Z. 2020. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE: 9756-9765[DOI: 10.1109/CVPR42600.2020.00978http://dx.doi.org/10.1109/CVPR42600.2020.00978]
Zhang Y, G. Tian G H, Shao X Y, Zhang M Y and Liu S P. 2023. Semantic Grounding for Long-Term Autonomy of Mobile Robots Toward Dynamic Object Search in Home Environments. IEEE Transactions on Industrial Electronics,70(2): 1655-1665 [DOI: 10.1109/TIE.2022.3159913http://dx.doi.org/10.1109/TIE.2022.3159913]
Zhao Z P, Hao K, Liu X F, Zheng T C, Xu J J, Cui S Y, He C, Zhou J and Zhao G M. 2023. MCANet: Hierarchical cross-fusion lightweight transformer based on multi-ConvHead attention for object detection. Image and Vision Computing, 136:104715 [DOI: 10.1016/j.imavis.2023.104715http://dx.doi.org/10.1016/j.imavis.2023.104715]
Zhong T. 2018. Vision-Based Pedestrian Tracking and Facial Active Perception for Unmanned Aerial Vehicles. Shanghai: Shanghai Jiao Tong University
钟韬. 2018. 基于视觉的无人机行人跟踪与人脸主动感知. 上海: 上海交通大学
Zhou T, Ye X Y, Zhao Y N, Lu H L and Liu F Z. 2024. Cross-modal attention YOLOv5 PET/CT lung cancer detection. Journal of Image and Graphics, 29(04):1070-1084
周涛,叶鑫宇,赵雅楠,陆慧玲,刘凤珍. 2024. 跨模态注意力YOLOv5的PET/CT肺部肿瘤检测 [DOI: 10.11834/jig.230169http://dx.doi.org/10.11834/jig.230169]
Zhu M Z, Zhao B L and Kong T. 2022.Navigating to Objects in Unseen Environments by Distance Prediction.//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Kyoto: IEEE: 10571-10578 [DOI: 10.1109/IROS47612.2022.9981766http://dx.doi.org/10.1109/IROS47612.2022.9981766]
Zhu Y K, Gordon D, Kolve E, Fox D, Li F F, Gupta A, Mottaghi R and Farhadi A. 2017.Visual Semantic Planning Using Deep Successor Representations//IEEE International Conference on Computer Vision (ICCV). Venice: IEEE: 483-492 [DOI: 10.1109/ICCV.2017.60http://dx.doi.org/10.1109/ICCV.2017.60]
Zou Z X, Chen K Y, Shi Z W and Ye J P. 2023. Object Detection in 20 Years: A Survey. Proceedings of the IEEE,111(3): 257-276 [DOI: 10.1109/JPROC.2023.3238524http://dx.doi.org/10.1109/JPROC.2023.3238524]
相关作者
相关机构