基于深度学习的视觉目标检测技术综述

曹家乐; 李亚利; 孙汉卿; 谢今; 黄凯奇; 庞彦伟

doi:10.11834/jig.220069

视觉理解与计算成像 | 浏览量 : 0 下载量: 0 CSCD: 30

PDF
导出
分享
收藏
专辑

基于深度学习的视觉目标检测技术综述
A survey on deep learning based visual object detection
2022年27卷第6期页码：1697-1722
纸质出版日期： 2022-06-16 ，

录用日期： 2022-03-04
DOI： 10.11834/jig.220069
稿件说明：

移动端阅览

曹家乐, 李亚利, 孙汉卿, 谢今, 黄凯奇, 庞彦伟. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022,27(6):1697-1722.

Jiale Cao, Yali Li, Hanqing Sun, Jin Xie, Kaiqi Huang, Yanwei Pang. A survey on deep learning based visual object detection[J]. Journal of Image and Graphics, 2022,27(6):1697-1722.
曹家乐, 李亚利, 孙汉卿, 谢今, 黄凯奇, 庞彦伟. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022,27(6):1697-1722. DOI： 10.11834/jig.220069.

Jiale Cao, Yali Li, Hanqing Sun, Jin Xie, Kaiqi Huang, Yanwei Pang. A survey on deep learning based visual object detection[J]. Journal of Image and Graphics, 2022,27(6):1697-1722. DOI： 10.11834/jig.220069.

摘要

视觉目标检测旨在定位和识别图像中存在的物体，属于计算机视觉领域的经典任务之一，也是许多计算机视觉任务的前提与基础，在自动驾驶、视频监控等领域具有重要的应用价值，受到研究人员的广泛关注。随着深度学习技术的飞速发展，目标检测取得了巨大的进展。首先，本文总结了深度目标检测在训练和测试过程中的基本流程。训练阶段包括数据预处理、检测网络、标签分配与损失函数计算等过程，测试阶段使用经过训练的检测器生成检测结果并对检测结果进行后处理。然后，回顾基于单目相机的视觉目标检测方法，主要包括基于锚点框的方法、无锚点框的方法和端到端预测的方法等。同时，总结了目标检测中一些常见的子模块设计方法。在基于单目相机的视觉目标检测方法之后，介绍了基于双目相机的视觉目标检测方法。在此基础上，分别对比了单目目标检测和双目目标检测的国内外研究进展情况，并展望了视觉目标检测技术发展趋势。通过总结和分析，希望能够为相关研究人员进行视觉目标检测相关研究提供参考。

Abstract

Visual object detection aims to locate and recognize objects in images

which is one of the classical tasks in the field of computer vision and also the premise and foundation of many computer vision tasks. Visual object detection plays a very important role in the applications of automatic driving

video surveillance

which has attracted extensive attention of the researches in past few decades. In recent years

with the development of the technique of deep learning

visual object detection has also made great progress. This paper focuses on a deep survey on deep learning based visual object detection

including monocular object detection and stereo object detection. First

we summarize the pipeline of deep object detection during the training and inference. The training process is composed of data pre-processing

detection network design

and label assignment and loss function in common. Data pre-processing (e.g.

multi-scale training and flip) aims to enhance the diversity of the given training data

which can improve detection performance of object detector. Detection network usually consists of three key parts like the backbone (e.g.

Visual Geometry Group(VGG) and ResNet)

feature fusion module (e.g.

feature pyramid network(FPN))

and prediction network (e.g.

region of interest head network(RoI head)). Label assignment aims to assign the true value for each prediction

and loss function can supervise the network training. During inference

we adopt the trained detector to generate the detection bounding-boxes and employ the post-processing step (e.g.

non-maximum suppression(NMS)) to combine the bounding-boxes. Second

we illustrate a deep review on monocular object detection

including anchor-based

anchor-free

and end-to-end methods

respectively. Anchor-based methods design some default anchors and perform classification and regression based on these default anchors

which can be further split into two-stage and one-stage methods. Two-stage methods first generate some candidate proposals based on the default anchors

and second classify/regress these proposals. Compared to two-stage methods

one-stage methods directly perform classification and regression on default anchors directly

which usually have a faster inference speed. The representative two-stage methods are regional-based convolutional neural network (R-CNN) series

and the representative one-stage methods are you only look once (YOLO) and single shot detector (SSD). Compared to anchor-based methods

more robust anchor-free methods perform classification and regression without any hand-crafted default anchors. We split anchor-free methods into keypoint-based methods

and center-based methods. Keypoint-based methods predict multiple keypoints of objects for localization

while center-based methods predict the left

right

top

and bottom distances to the object boundary. The representative keypoint-based method is CornerNet

and the representative center-based methods are fully convolutional one-stage detector (FCOS) and CenterNet. The anchor-based methods and anchor-free methods mentioned above require the post-processing to remove the redundant detection results for each object in common. To solve this problem

the recently introduced end-to-end methods directly predict one bounding box for each object straightforward

which can avoid the post-processing. The representative end-to-end method is detection transformer (DETR) that performs prediction via a transformer module. In addition

we review some classical modules employed in monocular object detection

including feature pyramid structure

prediction network design

label assignment and loss function. The feature pyramid structure employs different layers to detect multi-scaled objects

which can deal with scale variance issue. Prediction network design contains the re-designs of classification and regression

which aims to better deal with these two sub-tasks. Label assignment and loss function aim to better guide detector training. Third

we introduce stereo object detection. According to the coordinate space of features

existing detectors are divided into two categories: frustum-based and inverse-projection-based approaches. Frustum-based approaches directly predict 3D objects on features in the image frustum space. Stereo R-CNN and StereoCenterNet construct stereo features in the image frustum space via concatenating a pair of unary features concatenation. Plane-sweeping is another method of constructing frustum features as cost volumes

which is used in instance-depth-aware 3D detection (IDA-3D) and YOLOStereo3D. In contrast to the frustum-based approaches

inverse-projection-based approaches explicitly project the pixels or frustum features into 3D Cartesian space. There are mainly three manners of the inverse projection: projecting all pixels to the full 3D space as a pseudo point cloud

projecting the cost volume features to 3D feature volume features

or projecting the pixels in each region proposal to an instance-level point cloud. Pseudo-LiDAR is a pioneer method that transforms stereo images to their point cloud representation

which embraces the advances in both disparity estimation and LiDAR-based 3D detection. Deep stereo geometry network (DSGN) projects the frustum-based cost volume features to 3D volume features and further squeezes them into bird's eye view (BEV) for detection. Disp R-CNN leverages Mask R-CNN

a representative 2D instance segmentation model

and generates a set of instance-level point clouds for each stereo image pair. Based on the summary of monocular and stereo object detection

we further compare the progress of domestic and foreign researches

and present some representative universities or companies on visual object detection. Finally

we present some development tendency in visual object detection

including efficient end-to-end object detection

self-supervised object detection

long-tailed object detection

few-shot and zero-shot object detection

large-scale stereo object detection dataset

weakly-supervised stereo object detection.

关键词

视觉目标检测深度学习单目双目锚点框

Keywords

visual object detectiondeep learningmonocularstereoanchor

references

Beal J, Kim E, Tzeng E, Park D H, Zhai A and Kislyuk D. 2020. Toward transformer-based object detection [EB/OL]. [2020-12-17].https://arxiv.org/pdf/2012.09958.pdfhttps://arxiv.org/pdf/2012.09958.pdf

Bochkovskiy A, Wang C Y and Mark Liao H Y. 2020. Yolov4: optimal speed and accuracy of object detection [EB/OL]. [2020-04-23].https://arxiv.org/pdf/2004.10934.pdfhttps://arxiv.org/pdf/2004.10934.pdf

Bodla N, Singh B, Chellappa R and Davis L S. 2017. Soft-NMS-Improving object detection with one line of code//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5562-5570 [DOI: 10.1109/ICCV.2017.593http://dx.doi.org/10.1109/ICCV.2017.593]

Cai Q, Pan Y W, Wang Y, Liu J G, Yao T and Mei T. 2020. Learning a unified sample weighting network for object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 14161-14170 [DOI: 10.1109/CVPR42600.2020.01418http://dx.doi.org/10.1109/CVPR42600.2020.01418]

Cai Z W and Vasconcelos N. 2018. Cascade R-CNN: Delving into high quality object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 6154-6162 [DOI: 10.1109/CVPR.2018.00644http://dx.doi.org/10.1109/CVPR.2018.00644]

Cao J L, Cholakkal H, Anwer R M, Khan F S, Pang Y W and Shao L. 2020a. D2Det: towards high quality object detection and instance segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11482-11491 [DOI: 10.1109/CVPR42600.2020.01150http://dx.doi.org/10.1109/CVPR42600.2020.01150]

Cao J L, Pang Y W, Han J G and Li X L. 2019b. Hierarchical shot detector//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 9704-9713 [DOI: 10.1109/ICCV.2019.00980http://dx.doi.org/10.1109/ICCV.2019.00980]

Cao J L, Pang Y W and Li X L. 2019a. Triply supervised decoder networks for joint detection and segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7392-7401 [DOI: 10.1109/CVPR.2019.00757http://dx.doi.org/10.1109/CVPR.2019.00757]

Cao J L, Pang Y W, Zhao S J and Li X L. 2020b. High-level semantic networks for multi-scale object detection. IEEE Transactions on Circuits and Systems for Video Technology, 30(10): 3372-3386 [DOI: 10.1109/TCSVT.2019.2950526]

Cao Y H, Chen K, Loy C C and Lin D H. 2020c. Prime sample attention in object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11580-11588 [DOI: 10.1109/CVPR42600.2020.01160http://dx.doi.org/10.1109/CVPR42600.2020.01160]

Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A and Zagoruyko S. 2020. End-to-end object detection with transformers//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 213-229 [DOI: 10.1007/978-3-030-58452-8_13http://dx.doi.org/10.1007/978-3-030-58452-8_13]

Chang J R and Chen Y S. 2018. Pyramid stereo matching network//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5410-5418 [DOI: 10.1109/CVPR.2018.00567http://dx.doi.org/10.1109/CVPR.2018.00567]

Chen D J, Hsieh H Y and Liu T L. 2021a. Adaptive image transformer for one-shot object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 12247-12256 [DOI: 10.1109/CVPR46437.2021.01207http://dx.doi.org/10.1109/CVPR46437.2021.01207]

Chen G B, Choi W, Yu X, Han T and Chandraker M. 2017. Learning efficient object detection models with knowledge distillation//Proceedings of the 30st Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 742-751

Chen K A, Li J G, Lin W Y, See J, Wang J, Duan L Y, Chen Z B, He C W and Zou J N. 2019. Towards accurate one-stage object detection with AP-loss//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5119-5127 [DOI: 10.1109/CVPR.2019.00526http://dx.doi.org/10.1109/CVPR.2019.00526]

Chen P G, Liu S, Zhao H S and Jia J Y. 2020a. GridMask data augmentation[EB/OL]. [2022-01-18].https://arxiv.org/pdf/2001.04086.pdfhttps://arxiv.org/pdf/2001.04086.pdf

Chen Q, Wang Y M, Yang T, Zhang X Y, Cheng J and Sun J. 2021b. You only look one-level feature//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 13039-13048 [DOI: 10.1109/CVPR46437.2021.01284http://dx.doi.org/10.1109/CVPR46437.2021.01284]

Chen T, Kornblith S, Norouzi M and Hinton G. 2020b. A simple framework for contrastive learning of visual representations//Proceedings of the 37th International Conference on Machine Learning. Vienna, Austria: PMLR: 1597-1607

Chen X L and He K M. 2021c. Exploring simple Siamese representation learning//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 15750-15758 [DOI: 10.1109/CVPR46437.2021.01549http://dx.doi.org/10.1109/CVPR46437.2021.01549]

Chen X Z, Kundu K, Zhu Y K, Berneshawi A G, Ma H M, Fidler S and Urtasun R. 2015. 3D object proposals for accurate object class detection//Proceedings of the 28th Conference on Neural Information Processing Systems. Montreal, Canada: Curran Associates Inc. : 424-432 [DOI: 10.5555/2969239.2969287http://dx.doi.org/10.5555/2969239.2969287]

Chen Y H, Zhang Z, Cao Y, Wang L, Lin S and Hu H. 2020e. RepPoints v2: verification meets regression for object detection//Proceedings of the 34th Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc. : 5621-5631

Chen Y K, Li Y W, Kong T, Qi L, Chu R H, Li L and Jia J Y. 2021e. Scale-aware automatic augmentation for object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 9563-9572 [DOI: 10.1109/CVPR46437.2021.00944http://dx.doi.org/10.1109/CVPR46437.2021.00944]

Chen Y K, Zhang P Z, Li Z M, Li Y W, Zhang X Y, Qi L, Sun J and Jia J Y. 2020d. Dynamic scale training for object detection [EB/OL]. [2020-03-14].https://arxiv.org/pdf/2004.12432.pdfhttps://arxiv.org/pdf/2004.12432.pdf

Chen Y L, Liu S, Shen X Y and Jia J Y. 2020c. DSGN: Deep stereo geometry network for 3D object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 12533-12542 [DOI: 10.1109/CVPR42600.2020.01255http://dx.doi.org/10.1109/CVPR42600.2020.01255]

Chen Y X, Chen P G, Liu S, Wang L W and Jia J Y. 2021d. Deep structured instance graph for distilling object detectors//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 4359-4368 [DOI: 10.1109/ICCV48922.2021.00432http://dx.doi.org/10.1109/ICCV48922.2021.00432]

Chi C, Wei F Y and Hu H. 2020. RelationNet++: bridging visual representations for object detection via transformer decoder//Proceedings of the 34th Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc. : 13564-13574

Choe J, Joo K, Rameau F and So Kweon I S. 2021. Stereo object matching network//Proceedings of 2021 IEEE International Conference on Robotics and Automation. Xi′an, China: IEEE: 12918-12924 [DOI: 10.1109/ICRA48506.2021.9562027http://dx.doi.org/10.1109/ICRA48506.2021.9562027]

Dai J F, Li Y, He K M and Sun J. 2016. R-FCN: object detection via region-based fully convolutional networks//Proceedings of the 30th Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc. : 379-387 [DOI: 10.5555/3157096.3157139http://dx.doi.org/10.5555/3157096.3157139]

Dai J F, Qi H Z, Xiong Y W, Li Y, Zhang G D, Hu H and Wei Y C. 2017. Deformable convolutional networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 764-773 [DOI: 10.1109/ICCV.2017.89http://dx.doi.org/10.1109/ICCV.2017.89]

Dai X, Jiang Z R, Wu Z, Bao Y P, Wang Z C, Liu S and Zhou E J. 2021c. General instance distillation for object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 7842-7851 [DOI: 10.1109/CVPR46437.2021.00775http://dx.doi.org/10.1109/CVPR46437.2021.00775]

Dai X Y, Chen Y P, Xiao B, Chen D D, Liu M C, Yuan L and Zhang L. 2021a. Dynamic head: unifying object detection heads with attentions//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 7373-7382 [DOI: 10.1109/CVPR46437.2021.00729http://dx.doi.org/10.1109/CVPR46437.2021.00729]

Dai X Y, Chen Y P, Yang J W, Zhang P C, Yuan L and Zhang L. 2021b. Dynamic DETR: End-to-end object detection with dynamic attention//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 2988-2997 [DOI: 10.1109/ICCV48922.2021.00298http://dx.doi.org/10.1109/ICCV48922.2021.00298]

Dai Z G, Cai B L, Lin Y G and Chen J Y. 2021d. UP-DETR: unsupervised pre-training for object detection with transformers//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 1601-1610 [DOI: 10.1109/CVPR46437.2021.00165http://dx.doi.org/10.1109/CVPR46437.2021.00165]

Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE: 886-893 [DOI: 10.1109/CVPR.2005.177http://dx.doi.org/10.1109/CVPR.2005.177]

DeVries T and Taylor G W. 2017. Improved regularization of convolutional neural networks with cutout[EB/OL]. [2022-01-18].https://arxiv.org/pdf/1708.04552.pdfhttps://arxiv.org/pdf/1708.04552.pdf

Dong Z W, Li G X, Liao Y, Wang F, Ren P J and Qian C. 2020. CentripetalNet: pursuing high-quality Keypoint pairs for object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10516-10525 [DOI: 10.1109/CVPR42600.2020.01053http://dx.doi.org/10.1109/CVPR42600.2020.01053]

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2021. An image is worth 16×16 words: Transformers for image recognition at scale//Proceedings of the 9th International Conference on Learning Representations. [s. l.]:https://openreview.net/forum?id=YicbFdNTTyhttps://openreview.net/forum?id=YicbFdNTTy

Duan K W, Bai S, Xie L X, Qi H G, Huang Q M and Tian Q. 2019. CenterNet: keypoint triplets for object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 6568-6577 [DOI: 10.1109/ICCV.2019.00667http://dx.doi.org/10.1109/ICCV.2019.00667]

Duan K W, Xie L X, Qi H G, Bai S, Huang Q M and Tian Q. 2020. Corner proposal network for anchor-free, two-stage object detection//Proceedings of 2020 European Conference on Computer Vision. Glasgow, UK: Springer: 399-416 [DOI: 10.1007/978-3-030-58580-8_24http://dx.doi.org/10.1007/978-3-030-58580-8_24]

Dvornik N, Shmelkov K, Mairal J and Schmid C. 2017. BlitzNet: A real-time deep network for scene understanding//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4174-4182 [DOI: 10.1109/ICCV.2017.447http://dx.doi.org/10.1109/ICCV.2017.447]

Everingham M, Van Gool L, Williams C K I, Winn J and Zisserman A. 2010. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2): 303-338 [DOI: 10.1007/s11263-009-0275-4]

Fang H S, Sun J H, Wang R Z, Gou M H, Li Y L and Lu C W. 2019. InstaBoost: boosting instance segmentation via probability map guided copy-pasting//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 682-691 [DOI: 10.1109/ICCV.2019.00077http://dx.doi.org/10.1109/ICCV.2019.00077]

Feng C J, Zhong Y J, Gao Y, Scott M R and Huang W L. 2021b. TOOD: task-aligned one-stage object detection//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 3510-3519 [DOI: 10.1109/ICCV48922.2021.00349http://dx.doi.org/10.1109/ICCV48922.2021.00349]

Feng C J, Zhong Y J and Huang W L. 2021a. Exploring classification equilibrium in long-tailed object detection//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 3417-3426 [DOI: 10.1109/ICCV48922.2021.00340http://dx.doi.org/10.1109/ICCV48922.2021.00340]

Fu C Y, Liu W, Ranga A, Tyagi A and Berg A C. 2017. DSSD: deconvolutional single shot detector [EB/OL]. [2022-01-18].https://arxiv.org/pdf/1701.06659.pdfhttps://arxiv.org/pdf/1701.06659.pdf

Gao A Q, Cao J L and Pang Y W. 2021a. Shape prior non-uniform sampling guided real-time stereo 3D object detection [EB/OL]. [2021-06-22].https://arxiv.org/pdf/2106.10013.pdfhttps://arxiv.org/pdf/2106.10013.pdf

Gao Z T, Wang L M and Wu G S. 2021b. Mutual supervision for dense object detection//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 3641-3650 [DOI: 10.1109/ICCV48922.2021.00362http://dx.doi.org/10.1109/ICCV48922.2021.00362]

Ge Z, Liu S T, Li Z M, Yoshie O and Sun J. 2021. OTA: optimal transport assignment for object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 303-312 [DOI: 10.1109/CVPR46437.2021.00037http://dx.doi.org/10.1109/CVPR46437.2021.00037]

Geiger A, Lenz P and Urtasun R. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 3354-3361 [DOI: 10.1109/CVPR.2012.6248074http://dx.doi.org/10.1109/CVPR.2012.6248074]

Ghiasi G, Lin T Y and Le Q V. 2019. NAS-FPN: Learning scalable feature pyramid architecture for object detection//Proceedings of 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: 7036-7045 [DOI: 10.1109/CVPR.2019.00720http://dx.doi.org/10.1109/CVPR.2019.00720]

Girshick R. 2015. Fast R-CNN//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1440-1448 [DOI: 10.1109/ICCV.2015.169http://dx.doi.org/10.1109/ICCV.2015.169]

Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 580-587 [DOI: 10.1109/CVPR.2014.81http://dx.doi.org/10.1109/CVPR.2014.81]

Guo C X, Fan B, Zhang Q, Xiang S M and Pan C H. 2020a. AugFPN: Improving multi-scale feature learning for object detection//Proceedings of 2020 IEEE Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 12592-12601 [DOI: 10.1109/CVPR42600.2020.01261http://dx.doi.org/10.1109/CVPR42600.2020.01261]

Guo J Y, Han K, Wang Y H, Wu H, Chen X H, Xu C J and Xu C. 2021a. Distilling object detectors via decoupled features//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 2154-2164 [DOI: 10.1109/CVPR46437.2021.00219http://dx.doi.org/10.1109/CVPR46437.2021.00219]

Guo J Y, Han K, Wang Y H, Zhang C, Yang Z H, Wu H, Chen X H and Xu C. 2020b. Hit-detector: hierarchical trinity architecture search for object detection//Proceedings of 2020 IEEE Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11402-11411 [DOI: 10.1109/CVPR42600.2020.01142http://dx.doi.org/10.1109/CVPR42600.2020.01142]

Guo X Y, Shi S S, Wang X G and Li H S. 2021b. LIGA-stereo: learning Lidar geometry aware representations for stereo-based 3D detector//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 3153-3163

Gupta A, Dollar P and Girshick R. 2019. LVIS: a dataset for large vocabulary instance segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5356-5364 [DOI: 10.1109/CVPR.2019.00550http://dx.doi.org/10.1109/CVPR.2019.00550]

Han K, Wang Y H, Tian Q, Guo J Y, Xu C J and Xu C. 2020. GhostNet: more features from cheap operations//Proceedings of 2020 IEEE Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1577-1586 [DOI: 10.1109/CVPR42600.2020.00165http://dx.doi.org/10.1109/CVPR42600.2020.00165]

He K M, Fan H Q, Wu Y X, Xie S N and Girshick R. 2020. Momentum contrast for unsupervised visual representation learning//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 9726-9735 [DOI: 10.1109/CVPR42600.2020.00975http://dx.doi.org/10.1109/CVPR42600.2020.00975]

He K M, Gkioxari G, Dollár P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2980-2988 [DOI: 10.1109/ICCV.2017.322http://dx.doi.org/10.1109/ICCV.2017.322]

He K M, Zhang X Y, Ren S Q and Sun J. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1904-1916 [DOI: 10.1109/TPAMI.2015.2389824]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

He Y H, Zhang X Y, Savvides M and Kitani K. 2018. Softer-NMS: rethinking bounding box regression for accurate object detection[EB/OL]. [2022-01-18].http://arxiv.org/pdf/1809.08545.pdfhttp://arxiv.org/pdf/1809.08545.pdf

Hu M, Li Y L, Fang L and Wang S J. 2021. A2-FPN: attention aggregation based feature pyramid network for instance segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 15343-15352 [DOI: 10.1109/CVPR46437.2021.01509http://dx.doi.org/10.1109/CVPR46437.2021.01509].

Huang G, Liu Z, van der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2261-2269 [DOI: 10.1109/CVPR.2017.243http://dx.doi.org/10.1109/CVPR.2017.243]

Huang L C, Yang Y, Deng Y F and Yu Y N. 2015. DenseBox: unifying landmark localization with end to end object detection[EB/OL]. [2022-01-18].http://arxiv.org/pdf/1509.04874.pdfhttp://arxiv.org/pdf/1509.04874.pdf

Jiang B R, Luo R X, Mao J Y, Xiao T T and Jiang Y N. 2018. Acquisition of localization confidence for accurate object detection//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 816-832 [DOI: 10.1007/978-3-030-01264-9_48http://dx.doi.org/10.1007/978-3-030-01264-9_48]

Jiang W T, Zhang C, Zhang S C and Liu W J. 2019. Multiscale feature map fusion algorithm for target detection. Journal of Image and Graphics, 24(11): 1918-1931

姜文涛, 张驰, 张晟翀, 刘万军. 2019. 多尺度特征图融合的目标检测. 中国图象图形学报, 24(11): 1918-1931 [DOI: 10.11834/jig.190021]

Joseph K J, Khan S, Khan F S and Balasubramanian V N. 2021. Towards open world object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 5830-5840 [DOI: 10.1109/CVPR46437.2021.00577http://dx.doi.org/10.1109/CVPR46437.2021.00577]

Kang B Y, Liu Z, Wang X, Yu F, Feng J S and Darrell T. 2019. Few-shot object detection via feature reweighting//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 8419-8428 [DOI: 10.1109/ICCV.2019.00851http://dx.doi.org/10.1109/ICCV.2019.00851]

Ke W, Zhang T L, Huang Z Y, Ye Q X, Liu J Z and Huang D. 2020. Multiple anchor learning for visual object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10203-10212 [DOI: 10.1109/CVPR42600.2020.01022http://dx.doi.org/10.1109/CVPR42600.2020.01022]

Kim K and Lee H S. 2020. Probabilistic anchor assignment with IoU prediction for object detection//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 355-371 [DOI: 10.1007/978-3-030-58595-2_22http://dx.doi.org/10.1007/978-3-030-58595-2_22]

Kim S W, Kook H K, Sun J Y, Kang M C and Ko S J. 2018. Parallel feature pyramid network for object detection//Proceedings of the 15th Conference on Computer Vision. Munich, Germany: Springer: 239-256 [DOI: 10.1007/978-3-030-01228-1_15http://dx.doi.org/10.1007/978-3-030-01228-1_15]

Kong T, Sun F C, Huang W B and Liu H P. 2018. Deep feature pyramid reconfiguration for object detection//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 172-188 [DOI: 10.1007/978-3-030-01228-1_11http://dx.doi.org/10.1007/978-3-030-01228-1_11]

Kong T, Sun F C, Liu H P, Jiang Y N, Li L and Shi J B. 2019. FoveaBox: Beyond anchor-based object detector[EB/OL]. [2022-01-18].http://arxiv.org/pdf/1904.03797.pdfhttp://arxiv.org/pdf/1904.03797.pdf

Königshof H, Salscheider N O and Stiller C. 2019. Realtime 3D object detection for automated driving using stereo vision and semantic information//Proceedings of 2019 IEEE Intelligent Transportation Systems Conference. Auckland, New Zealand: IEEE: 1405-1410 [DOI: 10.1109/ITSC.2019.8917330http://dx.doi.org/10.1109/ITSC.2019.8917330]

Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc. : 1106-1114

Lan S Y, Ren Z, Wu Y, Davis L S and Hua G. 2020. SaccadeNet: A fast and accurate object detector//Proceedings of 2020 IEEE Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10394-10403 [DOI: 10.1109/CVPR42600.2020.01041http://dx.doi.org/10.1109/CVPR42600.2020.01041]

Law H and Deng J. 2018. CornerNet: detecting objects as paired keypoints//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 765-781 [DOI: 10.1007/978-3-030-01264-9_45http://dx.doi.org/10.1007/978-3-030-01264-9_45]

Law H, Teng Y, Russakovsky O and Deng J. 2020. CornerNet-lite: efficient keypoint based object detection//Proceedings of the British Machine Vision Conference. [s. l.]: BMVA

Li B Y, Liu Y and Wang X G. 2019a. Gradient harmonized single-stage detector//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI: 8577-8584 [DOI: 10.1609/aaai.v33i01.33018577http://dx.doi.org/10.1609/aaai.v33i01.33018577]

Li C Y, Ku J and Waslander S L. 2020a. Confidence guided stereo 3D object detection with split depth estimation//Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. Las Vegas, USA: IEEE: 5776-5783 [DOI: 10.1109/IROS45743.2020.9341188http://dx.doi.org/10.1109/IROS45743.2020.9341188]

Li H D, Wu Z X, Zhu C, Xiong C M, Socher R and Davis L S. 2020b. Learning from noisy anchors for one-stage object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10585-10594 [DOI: 10.1109/CVPR42600.2020.01060http://dx.doi.org/10.1109/CVPR42600.2020.01060]

Li H H, Zhou K P, Han T C. 2000. Ship object detection based on SSD improved with CReLU and FPN. Chinese Journal of Scientific Instrument, 41(4): 183-190

李晖晖, 周康鹏, 韩太初. 2000. 基于CReLU和FPN改进的SSD舰船目标检测. 仪器仪表学报, 41(4): 183-190 [DOI: 10.19650/j.cnki.cjsi.J2006122]

Li P L, Chen X Z and Shen S J. 2019b. Stereo R-CNN based 3D object detection for autonomous driving//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7644-7652 [DOI: 10.1109/CVPR.2019.00783http://dx.doi.org/10.1109/CVPR.2019.00783]

Li P X, Su S and Zhao H C. 2021a. RTS3D: real-time stereo 3D detection from 4D feature-consistency embedding space for autonomous driving//Proceedings of the 35th AAAI Conference on Artificial Intelligence. [s. l.]: AAAI: 1930-1939

Li Q Q, Jin S Y and Yan J J. 2017. Mimicking very efficient network for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 7341-7349 [DOI: 10.1109/CVPR.2017.776http://dx.doi.org/10.1109/CVPR.2017.776]

Li X, Wang W H, Hu X L, Li J, Tang J H and Yang J. 2021b. Generalized focal loss V2: learning reliable localization quality estimation for dense object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 11632-11641 [DOI: 10.1109/CVPR46437.2021.01146http://dx.doi.org/10.1109/CVPR46437.2021.01146]

Li X, Wang W H, Wu L J, Chen S, Hu X L, Li J, Tang J H and Yang J. 2020c. Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection//Proceedings of the 34th Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc. : 21002-21012

Li Y, Wang T, Kang B Y, Tang S, Wang C F, Li J T and Feng J S. 2020e. Overcoming classifier imbalance for long-tail object detection with balanced group Softmax//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10988-10997 [DOI: 10.1109/CVPR42600.2020.01100http://dx.doi.org/10.1109/CVPR42600.2020.01100]

Li Y H, Chen Y T, Wang N Y and Zhang Z X. 2019c. Scale-aware trident networks for object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 6053-6062 [DOI: 10.1109/ICCV.2019.00615http://dx.doi.org/10.1109/ICCV.2019.00615]

Li Y Z, Pang Y W, Shen J B, Cao J L and Shao L. 2020d. NETNet: neighbor erasing and transferring network for better single shot object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 13349-13358 [DOI: 10.1109/CVPR42600.2020.01336http://dx.doi.org/10.1109/CVPR42600.2020.01336]

Liang T T, Wang Y T, Tang Z, Hu G S and Ling H B. 2021. OPANAS: one-shot path aggregation network architecture search for object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 10195-10203 [DOI: 10.1109/CVPR46437.2021.01006http://dx.doi.org/10.1109/CVPR46437.2021.01006]

Lin T Y, Dollár P, Girshick R, He K M, Hariharan B and Belongie S. 2017a. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 936-944 [DOI: 10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106]

Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2017b. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2999-3007 [DOI: 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324]

Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P and Zitnick C L. 2014. MicrosoftCOCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 740-755 [DOI: 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48]

Liu J, Li D, Zheng R Z, Tian L and Shan Y. 2021a. RankDetNet: Delving into ranking constraints for object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 264-273 [DOI: 10.1109/CVPR46437.2021.00033http://dx.doi.org/10.1109/CVPR46437.2021.00033]

Liu S, Qi L, Qin H F, Shi J P and Jia J Y. 2018b. Path aggregation network for instance segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8759-8768 [DOI: 10.1109/CVPR.2018.00913http://dx.doi.org/10.1109/CVPR.2018.00913]

Liu S T, Huang D and Wang Y H. 2018a. Receptive field block net for accurate and fast object detection//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 404-419 [DOI: 10.1007/978-3-030-01252-6_24http://dx.doi.org/10.1007/978-3-030-01252-6_24]

Liu S T, Huang D and Wang Y H. 2019. Learning spatial fusion for single-shot object detection [EB/OL]. [2022-01-18].http://arxiv.org/pdf/1911.09516.pdfhttp://arxiv.org/pdf/1911.09516.pdf

Liu S T, Li Z M and Sun J. 2020a. Self-EMD: Self-supervised object detection without ImageNet [EB/OL]. [2022-01-18].http://arxiv.org/pdf/2011.13677.pdfhttp://arxiv.org/pdf/2011.13677.pdf

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot multiBox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 21-37 [DOI: 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2]

Liu Y X, Wang L J and Liu M. 2021b. YOLOStereo3D: a step back to 2 d for efficient stereo 3D detection//Proceedings of 2021 IEEE/CVF International Conference on Robotics and Automation. Xi′an, China: IEEE: 13018-13024 [DOI: 10.1109/ICRA48506.2021.9561423http://dx.doi.org/10.1109/ICRA48506.2021.9561423]

Liu Z, Lin Y T, Cao Y, Hu H, Wei Y X, Zhang Z, Lin S and Guo B N. 2021c. Swin transformer: hierarchical vision transformer using shifted windows//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 10012-10022 [DOI: 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986]

Liu Z L, Zheng T, Xu G D, Yang Z, Liu H F and Cai D. 2020b. Training-time-friendly network for real-time object detection//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 11685-11692 [DOI: 10.1609/aaai.v34i07.6838http://dx.doi.org/10.1609/aaai.v34i07.6838]

Lu X, Li B Y, Yue Y X, Li Q Q and Yan J J. 2019. Grid R-CNN//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7363-7372 [DOI: 10.1109/CVPR.2019.00754http://dx.doi.org/10.1109/CVPR.2019.00754]

Ma Y C, Liu S T, Li Z M and Sun J. 2021. IQDet: instance-wise quality distribution sampling for object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 1717-1725 [DOI: 10.1109/CVPR46437.2021.00176http://dx.doi.org/10.1109/CVPR46437.2021.00176]

Mayer N, Ilg E, Häusser P, Fischer P, Cremers D, Dosovitskiy A and Brox T. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4040-4048 [DOI: 10.1109/CVPR.2016.438http://dx.doi.org/10.1109/CVPR.2016.438]

Newell A, Yang K Y and Deng J. 2016. Stacked hourglass networks for human pose estimation//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 483-499 [DOI: 10.1007/978-3-319-46484-8_29http://dx.doi.org/10.1007/978-3-319-46484-8_29]

Nie J, Anwer R M, Cholakkal H, Khan F S, Pang Y W and Shao L. 2019. Enriched feature guided refinement network for object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 9536-9545 [DOI: 10.1109/ICCV.2019.00963http://dx.doi.org/10.1109/ICCV.2019.00963]

Oksuz K, Cam B C, Akbas E and Kalkan S. 2021. Rank and sort loss for object detection and instance segmentation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 3009-3018 [DOI: 10.1109/ICCV48922.2021.00300http://dx.doi.org/10.1109/ICCV48922.2021.00300]

Pang J M, Chen K, Shi J P, Feng H J, Ouyang W L and Lin D H. 2019. Libra R-CNN: towards balanced learning for object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 821-830 [DOI: 10.1109/CVPR.2019.00091http://dx.doi.org/10.1109/CVPR.2019.00091]

Pato L V, Negrinho R and Aguiar P M Q. 2020. Seeing without looking: Contextual rescoring of object detections for AP maximization//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 14598-14606 [DOI: 10.1109/CVPR42600.2020.01462http://dx.doi.org/10.1109/CVPR42600.2020.01462]

Peng W L, Pan H, Liu H and Sun Y. 2020. IDA-3D: instance-depth-aware 3D object detection from stereo vision for autonomous driving//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 13012-13021 [DOI: 10.1109/CVPR42600.2020.01303http://dx.doi.org/10.1109/CVPR42600.2020.01303]

Peng X D, Zhu X G, Wang T and Ma Y X. 2022. SIDE: center-based stereo 3D detector with structure-aware instance depth estimation//Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE: 119-128 [DOI: 10.1109/WACV51458.2022.00030http://dx.doi.org/10.1109/WACV51458.2022.00030]

Pham C C and Jeon J W. 2017. Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks. Signal Processing: Image Communication, 53: 110-122 [DOI: 10.1016/j.image.2017.02.007]

Pon A D, Ku J, Li C Y and Waslander S L. 2020. Object-centric stereo matching for 3D object detection//Proceedings of 2020 IEEE/CVF International Conference on Robotics and Automation. Paris, France: IEEE: 8383-8389 [DOI: 10.1109/ICRA40945.2020.9196660http://dx.doi.org/10.1109/ICRA40945.2020.9196660]

Qian Q, Chen L, Li H and Jin R. 2020a. DR loss: improving object detection by distributional ranking//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 12161-12169 [DOI: 10.1109/CVPR42600.2020.01218http://dx.doi.org/10.1109/CVPR42600.2020.01218]

Qian R, Garg D, Wang Y, You Y R, Belongie S, Hariharan B, Campbell M, Weinberger K Q and Chao W L. 2020b. End-to-end pseudo-liDAR for image-based 3D object detection//Proceedings of 2020 IEEE/CVF Conference onComputer Vision and Pattern Recognition. Seattle, USA: IEEE: 5880-5889 [DOI: 10.1109/CVPR42600.2020.00592http://dx.doi.org/10.1109/CVPR42600.2020.00592]

Qiao S Y, Chen L C and Yuille A. 2021. DetectoRS: detecting objects with recursive feature pyramid and switchable atrous convolution//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 10213-10224 [DOI: 10.1109/CVPR46437.2021.01008http://dx.doi.org/10.1109/CVPR46437.2021.01008]

Qin Z Y, Wang J L and Lu Y. 2019. Triangulation learning network: from monocular to stereo 3D object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7615-7623 [DOI: 10.1109/CVPR.2019.00780http://dx.doi.org/10.1109/CVPR.2019.00780]

Qiu H, Ma Y C, Li Z M, Liu S T and Sun J. 2020b. BorderDet: Border feature for dense object detection//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 549-564 [DOI: 10.1007/978-3-030-58452-8_32http://dx.doi.org/10.1007/978-3-030-58452-8_32]

Qiu H Q, Li H L, Wu Q B and Shi H C. 2020a. Offset bin classification network for accurate object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 13185-13194 [DOI: 10.1109/CVPR42600.2020.01320http://dx.doi.org/10.1109/CVPR42600.2020.01320]

Qiu H Q, Li H L, Wu Q B, Cui J H, Song Z C, Wang L X and Zhang M J. 2021. CrossDet: crossline representation for object detection//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 3195-3204 [DOI: 10.1109/ICCV48922.2021.00318http://dx.doi.org/10.1109/ICCV48922.2021.00318]

Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: Unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 779-788 [DOI: 10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91]

Redmon J and Farhadi A. 2017. YOLO9000: better, faster, stronger//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6517-6525 [DOI: 10.1109/CVPR.2017.690http://dx.doi.org/10.1109/CVPR.2017.690]

Redmon J and Farhadi A. 2018. YOLOv3: an incremental improvement[EB/OL]. [2022-01-18].https://arxiv.org/pdf/1804.02767.pdfhttps://arxiv.org/pdf/1804.02767.pdf

Ren S Q, He K M, Girshick R and Sun J. 2015. Faster R-CNN: towards real-time object detection with region proposal networks//Proceedings of the 28th Conference on Neural Information Processing Systems. Montreal, Canada: Curran Associates Inc. : 91-99

Rezatofighi H, Tsoi N, Gwak J Y, Sadeghian A, Reid I and Savarese S. 2019. Generalized intersection over union: a metric and a loss for bounding box regression//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 658-666 [DOI: 10.1109/CVPR.2019.00075http://dx.doi.org/10.1109/CVPR.2019.00075]

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S A, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C and Li F F. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3): 211-252 [DOI: 10.1007/s11263-015-0816-y]

Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R and LeCun Y. 2014. OverFeat: Integrated recognition, localization and detection using convolutional networks//Proceedings of the 2nd International Conference on Learning Representations. Banff, Canada: [s. n.]

Shi Y G, Guo Y, Mi Z Q and Li X J. 2022. Stereo centerNet-based 3D object detection for autonomous driving. Neurocomputing, 471: 219-229 [DOI: 10.1016/j.neucom.2021.11.048]

Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2022-01-18].https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf

Singh B and Davis L S. 2018. An analysis of scale invariance in object detection-SNIP//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3578-3587 [DOI: 10.1109/CVPR.2018.00377http://dx.doi.org/10.1109/CVPR.2018.00377]

Singh B, Najibi M and Davis L S. 2018. SNIPER: efficient multi-scale training//Proceedings of the 32nd Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc. : 9333-9343

Singh K K and Lee Y J. 2017. Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 3544-3553 [DOI: 10.1109/ICCV.2017.381http://dx.doi.org/10.1109/ICCV.2017.381]

Song G L, Liu Y and Wang X G. 2020. Revisiting the sibling head in object detector//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11560-11569 [DOI: 10.1109/CVPR42600.2020.01158http://dx.doi.org/10.1109/CVPR42600.2020.01158]

Sun B, Li B H, Cai S C, Yuan Y and Zhang C. 2021a. FSCE: few-shot object detection via contrastive proposal encoding//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. [s. l.]: IEEE: 7352-7362

Sun J M, Chen L H, Xie Y M, Zhang S Y, Jiang Q H, Zhou X W and Bao H J. 2020a. DISP R-CNN: stereo 3D object detection via shape prior guided instance disparity estimation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10545-10554 [DOI: 10.1109/CVPR42600.2020.01056http://dx.doi.org/10.1109/CVPR42600.2020.01056]

Sun P Z,Jiang Y, Xie E Z, Shao W Q, Yuan Z H, Wang C H and Luo P. 2021b. What makes for end-to-end object detection//Proceedings of the 38th International Conference on Machine Learning. [s. l.]: PMLR: 9934-9944

Sun P Z, Zhang R F, Jiang Y, Kong T, Xu C F, Zhan W, Tomizuka M, Li L, Yuan Z H, Wang C H and Luo P. 2021c. Sparse R-CNN: end-to-end object detection with learnable proposals//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 14454-14463 [DOI: 10.1109/CVPR46437.2021.01422http://dx.doi.org/10.1109/CVPR46437.2021.01422]

Sun R Y, Tang F H, Zhang X P, Xiong H K and Tian Q. 2020b. Distilling object detectors with task adaptive regularization[EB/OL]. [2022-01-18].https://arxiv.org/pdf/2006.13108.pdfhttps://arxiv.org/pdf/2006.13108.pdf

Sun Z Q, Cao S C, Yang Y M and Kitani K. 2021 d. Rethinking transformer-based set prediction for object detection//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 3611-3620 [DOI: 10.1109/ICCV48922.2021.00359http://dx.doi.org/10.1109/ICCV48922.2021.00359]

Szegedy C, Toshev A and Erhan D. 2013. Deep neural networks for object detection//Proceedings of the 26th Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc. : 2553-2561

Tan J R, Lu X, Zhang G, Yin C Q and Li Q Q. 2021. Equalization loss v2: a new gradient balance approach for long-tailed object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 1685-1694 [DOI: 10.1109/CVPR46437.2021.00173http://dx.doi.org/10.1109/CVPR46437.2021.00173]

Tan J R, Wang C B, Li B Y, Li Q Q, Ouyang W L, Yin C Q and Yan J J. 2020a. Equalization loss for long-tailed object recognition//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11659-11668 [DOI: 10.1109/CVPR42600.2020.01168http://dx.doi.org/10.1109/CVPR42600.2020.01168]

Tan M X, Pang R M and Le Q V. 2020b. EfficientDet: Scalable and efficient object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10778-10787 [DOI: 10.1109/CVPR42600.2020.01079http://dx.doi.org/10.1109/CVPR42600.2020.01079]

Tian Z, Shen C H, Chen H and He T. 2019. FCOS: fully convolutional one-stage object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 9626-9635 [DOI: 10.1109/ICCV.2019.00972http://dx.doi.org/10.1109/ICCV.2019.00972]

Uijlings J R R, Van De Sande K E A, Gevers T and Smeulders A W M. 2013. Selective search for object recognition. International Journal of Computer Vision, 104(2): 154-171 [DOI: 10.1007/s11263-013-0620-5]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 6000-6010 [DOI: 10.5555/3295222.3295349http://dx.doi.org/10.5555/3295222.3295349]

Viola P and Jones M J. 2004. Robust real-time face detection. International Journal of Computer Vision, 57(2): 137-154 [DOI: 10.1023/B:VISI.0000013087.49260.fb]

Vu T, Jang H, Pham T X and Yoo C D. 2019. Cascade RPN: delving into high-quality region proposal network with adaptive convolution//Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc. : 1430-1440

Wang J F, Song L, Li Z M, Sun H B, Sun J and Zheng N N. 2021a. End-to-end object detection with fully convolutional network//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 15849-15858 [DOI: 10.1109/CVPR46437.2021.01559http://dx.doi.org/10.1109/CVPR46437.2021.01559]

Wang J Q, Chen K, Yang S, Loy C C and Lin D H. 2019a. Region proposal by guided anchoring//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2965-2974 [DOI: 10.1109/CVPR.2019.00308http://dx.doi.org/10.1109/CVPR.2019.00308]

Wang J Q, Zhang W W, Cao Y H, Chen K, Pang J M, Gong T, Shi J P, Loy C C and Lin D H. 2020a. Side-aware boundary localization for more precise object detection//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 403-419 [DOI: 10.1007/978-3-030-58548-8_24http://dx.doi.org/10.1007/978-3-030-58548-8_24]

Wang K Y and Zhang L. 2021b. Reconcile prediction consistency for balanced object detection//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE: 3631-3640

Wang N, Gao Y, Chen H, Wang P, Tian Z, Shen C H and Zhang Y N. 2020b. NAS-FCOS: fast neural architecture search for object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Visionand Pattern Recognition. Seattle, USA: IEEE: 11940-11948 [DOI: 10.1109/CVPR42600.2020.01196http://dx.doi.org/10.1109/CVPR42600.2020.01196]

Wang T, Yuan L, Zhang X P and Feng J S. 2019c. Distilling object detectors with fine-grained feature imitation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4933-4942 [DOI: 10.1109/CVPR.2019.00507http://dx.doi.org/10.1109/CVPR.2019.00507]

Wang T C, Anwer R M, Cholakkal H, Khan F S, Pang Y W and Shao L. 2019b. Learning rich features at high-speed for single-shot object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 1971-1980 [DOI: 10.1109/ICCV.2019.00206http://dx.doi.org/10.1109/ICCV.2019.00206]

Wang W H, Xie E Z, Li X, Fan D P, Song K T, Liang D, Lu T, Luo P and Shao L. 2021c. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 568-578 [DOI: 10.1109/ICCV48922.2021.00061http://dx.doi.org/10.1109/ICCV48922.2021.00061]

Wang X J, Zhang S L, Yu Z R, Feng L T and Zhang W. 2020c. Scale-equalizing pyramid convolution for object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 13356-13365 [DOI: 10.1109/CVPR42600.2020.01337http://dx.doi.org/10.1109/CVPR42600.2020.01337]

Wang Y, Chao W L, Garg D, Hariharan B, Campbell M and Weinberger K Q. 2019d. Pseudo-LiDAR from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 8445-8453 [DOI: 10.1109/CVPR.2019.00864http://dx.doi.org/10.1109/CVPR.2019.00864]

Wang Y, Yang B, Hu R, Liang M and Urtasun R. 2021d. PLUMENet: efficient 3D object detection from stereo images//Proceedings of 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems. Prague, Czech Republic: IEEE: 3383-3390 [DOI: 10.1109/IROS51168.2021.9635875http://dx.doi.org/10.1109/IROS51168.2021.9635875]

Wei F Y, Sun X, Li H Y, Wang J D and Lin S. 2020. Point-set anchors for object detection, instance segmentation and pose estimation//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 527-544 [DOI: 10.1007/978-3-030-58607-2_31http://dx.doi.org/10.1007/978-3-030-58607-2_31]

Wu J L, Song L C, Wang T C, Zhang Q and Yuan J S. 2020a. Forest R-CNN: large-vocabulary long-tailed object detection and instance segmentation//Proceedings of 2020 ACM Multimedia. Seattle, USA: ACM: 1570-1578 [DOI: 10.1145/3394171.3413970http://dx.doi.org/10.1145/3394171.3413970]

Wu Y, Chen Y P, Yuan L, Liu Z C, Wang L J, Li H Z and Fu Y. 2020b. Rethinking classification and localization for object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10183-10192 [DOI: 10.1109/CVPR42600.2020.01020http://dx.doi.org/10.1109/CVPR42600.2020.01020]

Xie E Z, Ding J, Wang W H, Zhan X H, Xu H, Sun P Z, Li Z G and Luo P. 2021. DetCo: unsupervised contrastive learning for object detection//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 8392-8401 [DOI: 10.1109/ICCV48922.2021.00828http://dx.doi.org/10.1109/ICCV48922.2021.00828]

Xu B and Chen Z Z. 2018. Multi-level fusion based 3D object detection from monocular images//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: 2345-2353 [DOI: 10.1109/CVPR.2018.00249http://dx.doi.org/10.1109/CVPR.2018.00249]

Xu H, Yao L W, Zhang W, Liang X D and Li Z G. 2019. Auto-FPN: automatic network architecture adaptation for object detection beyond classification//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 6648-6657 [DOI: 10.1109/ICCV.2019.00675http://dx.doi.org/10.1109/ICCV.2019.00675]

Xu H F and Zhang J Y. 2020. AANet: adaptive aggregation network for efficient stereo matching//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: 1956-1965 [DOI: 10.1109/CVPR42600.2020.00203http://dx.doi.org/10.1109/CVPR42600.2020.00203]

Xu Z B, Zhang W, Ye X Q, Tan X, Yang W, Wen S L, Ding E R, Meng A J and Huang L S. 2020. ZoomNet: part-aware adaptive zooming neural network for 3D object detection//Proceedings of 2020 AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 12557-12564

Yamaguchi K, McAllester D and Urtasun R. 2014. Efficient joint segmentation, occlusion labeling, stereo and flow estimation//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 756-771 [DOI: 10.1007/978-3-319-10602-1_49http://dx.doi.org/10.1007/978-3-319-10602-1_49]

Yang Z, Liu S H, Hu H, Wang L W and Lin S. 2019. RepPoints: point set representation for object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 9656-9665 [DOI: 10.1109/ICCV.2019.00975http://dx.doi.org/10.1109/ICCV.2019.00975]

Yang Z, Xu Y H, Xue H, Zhang Z, Urtasun R, Wang L W, Lin S and Hu H. 2020. Dense RepPoints: representing visual objects with dense point sets//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 227-244 [DOI: 10.1007/978-3-030-58589-1_14http://dx.doi.org/10.1007/978-3-030-58589-1_14]

Yao L W, Pi R J, Xu H, Zhang W, Li Z G and Zhang T. 2021. G-DetKD: towards general distillation framework for object detectors via contrastive and semantic-guided feature imitation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 3591-3600 [DOI: 10.1109/ICCV48922.2021.00357http://dx.doi.org/10.1109/ICCV48922.2021.00357]

Yao Q L, Hu X and Lei H. 2019. Aircraft detection in remote sensing imagery with multi-scale feature fusion convolutional neural networks. Acta Geodaetica et Cartographica Sinica, 48(10): 1266-1274

姚群力, 胡显, 雷宏. 2019. 基于多尺度融合特征卷积神经网络的遥感图像飞机目标检测. 测绘学报, 48(10): 1266-1274 [DOI: 10.11947/j.AGCS.2019.20180398]

Yoo J, Lee H, Chung I, Seo G and Kwak N. 2021. Training multi-object detector by estimating bounding box distribution for input image//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 3437-3446 [DOI: 10.1109/ICCV48922.2021.00342http://dx.doi.org/10.1109/ICCV48922.2021.00342]

You Y R, Wang Y, Chao W L, Garg D, Pleiss G, Hariharan B, Campbell M and Weinberger K Q. 2020. Pseudo-LiDAR++: accurate depth for 3D object detection in autonomous driving//Proceedings of 2020 International Conference on Learning Representations. Addis Ababa, Ethiopia: OpenReview. net

Yu J H, Jiang Y N, Wang Z Y, Cao Z M and Huang T. 2016. UnitBox: an advanced object detection network//Proceedings of 2016 ACM Multimedia. Amsterdam, the Netherlands: ACM: 516-520 [DOI: 10.1145/2964284.2967274http://dx.doi.org/10.1145/2964284.2967274]

Yun S, Han D, Chun S, Oh S J, Yoo Y and Choe J. 2019. CutMix: regularization strategy to train strong classifiers with localizable features//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 6022-6031 [DOI: 10.1109/ICCV.2019.00612http://dx.doi.org/10.1109/ICCV.2019.00612]

Zhang D, Zhang H W, Tang J H, Wang M, Hua X S and Sun Q R. 2020a. Feature pyramid transformer//Proceedings of the 16th European Conference on Computer Vision, 2020. Glasgow, UK: Springer: 323-339 [DOI: 10.1007/978-3-030-58604-1_20http://dx.doi.org/10.1007/978-3-030-58604-1_20]

Zhang H Y, Cisse M, Dauphin Y N and Lopez-Paz D. 2018a. Mixup: beyond empirical risk minimization//Proceedings of 2018 International Conference on Learning Representations. Vancouver, Canada: OpenReview. net

Zhang H Y, Wang Y, Dayoub F and Sünderhauf N. 2021a. VarifocalNet: an IoU-aware dense object detector//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 8514-8523 [DOI: 10.1109/CVPR46437.2021.00841http://dx.doi.org/10.1109/CVPR46437.2021.00841]

Zhang L, Zhou S G, Guan J H and Zhang J. 2021c. Accurate few-shot object detection with support-query mutual guidance and hybrid loss//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 14424-14432 [DOI: 10.1109/CVPR46437.2021.01419http://dx.doi.org/10.1109/CVPR46437.2021.01419]

Zhang L F and Ma K S. 2021b. Improve object detection with feature-based knowledge distillation: towards accurate and efficient detectors//Proceedings of the 9th International Conference on Learning Representations. [s. l.]: OpenReview. net

Zhang S F, Chi C, Yao Y Q, Lei Z and Li S Z. 2020b. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 9756-9765 [DOI: 10.1109/CVPR42600.2020.00978http://dx.doi.org/10.1109/CVPR42600.2020.00978]

Zhang S F, Wen L Y, Bian X, Lei Z and Li S Z. 2018b. Single-shot refinement neural network for object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4203-4212 [DOI: 10.1109/CVPR.2018.00442http://dx.doi.org/10.1109/CVPR.2018.00442]

Zhang W, Zhuang X T, Wang X L, Chen Y F and Li Y C. 2021. DS-YOLO: a real-time small object detection algorithm on UAVs. Journal of Nanjing University of Posts and Telecommunications (Natural Science Edition), 41(1): 86-98

张伟, 庄幸涛, 王雪力, 陈云芳, 李延超. 2021. DS-YOLO: 一种部署在无人机终端上的小目标实时检测算法. 南京邮电大学学报(自然科学版), 41(1): 86-98 [DOI: 10.14132/j.cnki.1673-5439.2021.01.011]

Zhang X S, Wan F, Liu C, Ji R R and Ye Q X. 2019. FreeAnchor: learning to match anchors for visual object detection//Proceedings of the 22nd Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc. : 147-155

Zhang Z S, Qiao S Y, Xie C H, Shen W, Wang B and Yuille A L. 2018c. Single-shot object detection with enriched semantics//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: 5813-5821 [DOI: 10.1109/CVPR.2018.00609http://dx.doi.org/10.1109/CVPR.2018.00609]

Zhao G M, Ge W F and Yu Y Z. 2021. GraphFPN: graph feature pyramid network for object detection//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 2763-2772 [DOI: 10.1109/ICCV48922.2021.00276http://dx.doi.org/10.1109/ICCV48922.2021.00276]

Zhao Q J, Sheng T, Wang Y T, Tang Z, Chen Y, Cai L and Ling H B. 2019. M2Det: a single-shot object detector based on multi-level feature pyramid network//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI: 9259-9266 [DOI: 10.1609/aaai.v33i01.33019259http://dx.doi.org/10.1609/aaai.v33i01.33019259]

Zhao Y Q, Rao Y, Dong S P and Zhang J Y. 2020. Survey on deep learning object detection. Journal of Image and Graphics, 25(4): 629-654

赵永强, 饶元, 董世鹏, 张君毅. 2020. 深度学习目标检测方法综述. 中国图象图形学报, 25(4): 629-654 [DOI: 10.11834/jig.190307]

Zheng M H, Gao P, Zhang R R, Li K C, Wang X G, Li H S and Dong H. 2020a. End-to-end object detection with adaptive clustering transformer[EB/OL]. [2022-01-18].https://arxiv.org/pdf/2011.09315.pdfhttps://arxiv.org/pdf/2011.09315.pdf

Zheng Z H, Wang P, Liu W, Li J Z, Ye R G and Ren D W. 2020b. Distance-IOU Loss: faster and better learning for bounding box regression//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 12993-13000 [DOI: 10.1609/aaai.v34i07.6999http://dx.doi.org/10.1609/aaai.v34i07.6999]

Zhong Q Y, Li C, Zhang Y Y, Xie D, Yang S C and Pu S L. 2020a Cascade region proposal and global context for deep object detection. Neurocomputing, 395: 170-177 [DOI: 10.1016/j.neucom.2017.12.070http://dx.doi.org/10.1016/j.neucom.2017.12.070]

Zhong Z, Zheng L, Kang G L, Li S Z and Yang Y. 2020b Random erasing data augmentation//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 13001-13008

Zhou D Z, Zhou X C, Zhang H W, Yi S and Ouyang W L. 2020. Cheaper pre-training lunch: an efficient paradigm for object detection//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 258-274 [DOI: 10.1007/978-3-030-58598-3_16http://dx.doi.org/10.1007/978-3-030-58598-3_16]

Zhou P, Ni B B, Geng C, Hu J G and Xu Y. 2018. Scale-transferrableobject detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 528-537 [DOI: 10.1109/CVPR.2018.00062http://dx.doi.org/10.1109/CVPR.2018.00062]

Zhou X Y, Wang D Q and Krähenbühl P. 2019a. Objects as points [EB/OL]. [2022-01-18].https://arxiv.org/pdf/1904.07850.pdfhttps://arxiv.org/pdf/1904.07850.pdf

Zhou X Y, Zhuo J C and Krähenbühl P. 2019b. Bottom-up object detection by grouping extreme and center points//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 850-859 [DOI: 10.1109/CVPR.2019.00094http://dx.doi.org/10.1109/CVPR.2019.00094]

Zhu C C, He Y H and Savvides M. 2019. Feature selective anchor-free module for single-shot object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 840-849 [DOI: 10.1109/CVPR.2019.00093http://dx.doi.org/10.1109/CVPR.2019.00093]

Zhu X Z, Su W J, Lu L W, Li B, Wang X G and Dai J F. 2021. Deformable DETR: deformable transformers for end-to-end object detection//Proceedings of 2021 International Conference on Learning Representations. [s. l.]: OpenReview. net

Zhu Y S, Zhao C Y, Wang J Q, Zhao X, Wu Y and Lu H Q. 2017. CoupleNet: coupling global structure with local parts for object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4146-4154 [DOI: 10.1109/ICCV.2017.444http://dx.doi.org/10.1109/ICCV.2017.444]

文章被引用时，请邮件提醒。

提交