自动驾驶中的三维目标检测算法研究综述
Survey of 3D object detection algorithms for autonomous driving
- 2024年29卷第11期 页码:3238-3264
纸质出版日期: 2024-11-16
DOI: 10.11834/jig.230779
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-11-16 ,
移动端阅览
李昌财, 陈刚, 侯作勋, 黄凯, 张伟. 2024. 自动驾驶中的三维目标检测算法研究综述. 中国图象图形学报, 29(11):3238-3264
Li Changcai, Chen Gang, Hou Zuoxun, Huang Kai, Zhang Wei. 2024. Survey of 3D object detection algorithms for autonomous driving. Journal of Image and Graphics, 29(11):3238-3264
新兴的三维目标检测技术在自动驾驶领域中扮演着关键的角色,它通过提供环境感知和障碍物检测等信息,为自动驾驶系统的决策和控制提供了基础。过去的许多学者对该领域优秀的方法论和成果进行了全面的检验和研究。然而,由于技术上的不断更新和快速进步,对该领域的最新进展保持持续跟踪并坚持跟随知识前沿,不仅是学术界的一项至关重要任务,同时也是应对新兴挑战的一项基础。本文回顾了近两年内的新兴成果并针对该方向中的前沿理论进行系统性的阐述。首先,简单介绍三维目标检测的背景知识并回顾相关的综述研究。然后,从数据规模、多样性等方面对KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago)等多个流行的数据集进行了归纳总结,并进一步介绍相关基准的评测原理。接下来,按照传感器类型和数量将最近的几十种检测方法划分为基于单目的、基于立体的、基于多视图的、基于激光雷达的、基于多模态5个类别,并根据模型架构或数据预处理方式的不同对每一种类别进行更深层次的细分。在每一种类别的方法中,首先对其代表性算法进行简单回顾,然后着重对该类别中最前沿的方法进行综述介绍,并进一步深入分析了该类别潜在的发展前景和当前面临的严峻挑战。最后展望了三维目标检测领域未来的研究方向。
Conventional two-dimensional (2D) object detection technology primarily emphasizes classifying the target to be detected and defining its bounding box in image space coordinates but lacks the capability to provide accurate information regarding the real three-dimensional (3D) spatial position of the target. This limitation restricts its applicability in autonomous driving systems (ADs), particularly for tasks such as obstacle avoidance and path planning in real 3D environments. The emerging field of 3D object detection represents a substantial technological advancement. This field primarily relies on neural networks to extract features from input data, commonly obtained from camera images or LiDAR-captured point clouds. Following feature extraction, 3D object detection predicts the category of the target and furnishes crucial data, including its spatial coordinates, dimensions, and yaw angles in a real-world coordinate system. This detection facilitates the provision of essential preliminary information for subsequent operations, such as object tracking, trajectory forecasting, and path planning. Consequently, this technology has assumed a vital role within the field of ADs, serving as a cornerstone within the domain of perception tasks. The field of 3D object detection has currently witnessed the emergence of numerous exceptional methodologies, exhibiting notable accomplishments. Several scholars have conducted comprehensive reviews and in-depth assessments of these pertinent works and their associated outcomes. However, prior reviews may have omitted the latest developments due to the rapid evolution of technology within the domain of computer vision. Therefore, constantly monitoring the most recent advancements and continuing at the frontline of this realm is not only an imperative task for the academic community but is also a fundamental endeavor to effectively respond to the emerging challenges posed by the incessant and expeditious technological advancements and progression. Based on the preceding considerations, this paper conducts a systematic review of the latest developments and cutting-edge theories in the realm of existing 3D object detection. In contrast to earlier review studies, the current work offers distinct advantages because it encompasses the inclusion of more cutting-edge methodologies and encompasses a broader spectrum of fields. For example, while most prior review works predominantly concentrated on individual sensors, this work uniquely incorporates a multitude of diverse sensor types. Moreover, this work encompasses a wide array of distinct training strategies, ranging from semi-supervised and weak-supervised methods to active learning and knowledge distillation techniques, thereby substantially enhancing the breadth and depth of research within this field. Specifically, this work starts with a concise contextualization of the progress of the field and conducts a brief examination of pertinent review research. Subsequently, the fundamental definition of 3D object detection is explored, and multiple widely used datasets are comprehensively summarized based on data scale and diversity, extending the discourse to the introduction of the evaluation criteria integral to the relevant benchmark assessments. Among these datasets, three widely recognized datasets are particularly highlighted: KITTI, nuScenes, and Waymo Open. Next, the multitude of detection methods proposed in the previous year is categorized into five distinct groups, primarily dictated by the type and quantity of sensors involved: monocular-based, stereo-based, multi-view-based, LiDAR-based, and multimodal-based. Additionally, further subcategorization is conducted within each group according to the specific data preprocessing methods or model architectures utilized. Within each method category grounded in distinct sensor types, the examination starts with a comprehensive review of the pioneering representative algorithms. An intricate exposition of the latest and most advanced methodologies within that specific domain is then offered. Furthermore, an in-depth analysis of the prospective pathways for development and the formidable challenges currently encountered by this category is conducted. Among the five categories, the monocular method relies solely on a single camera for the classification and localization of environmental objects. This approach is cost-effective and easy to implement. However, it grapples with the challenge of ill-posed depth information regression from monocular images, which frequently results in reduced accuracy for this method. The stereo-based method leverages stereo images to enforce geometric constraints, leading to more precise depth estimation and comparatively higher detection accuracy. However, the requirement for stereo camera calibration drives up the deployment costs of this method, thereby maintaining its susceptibility to environmental factors. The multi-view-based method seeks to establish a unified feature space through the utilization of multiple surrounding cameras. Unlike the first two approaches, this method provides improved safety and practicality due to its panoramic perspective. However, the absence of direct constraints between cameras results in its inherent ill-posed nature. LiDAR-based methods excel in directly providing accurate depth information, which eliminates the need for additional depth estimation. This method leads to enhanced detection efficiency and accuracy compared to image-centric methods. Despite these advantages, the substantial hardware costs associated with LiDAR pose a considerable financial burden on real-world deployments. The multimodal-based approaches leverage the advantages of image and point cloud data, albeit at the cost of increased computational time required for the concurrent processing of these data modalities. In a broader context, each of the five method categories exhibits unique strengths and limitations, necessitating a careful selection based on financial considerations and specific application prerequisites during real-world engineering deployment. Upon concluding the exhaustive review of all methodologies, comprehensive statistical analyses of these techniques are conducted on datasets such as KITTI, nuScenes, and Waymo Open. The statistical evaluations encompassed aspects pertaining to detection performance and inference time. In this research, we have meticulously reviewed 3Dobject detection algorithms in the context of AD. This comprehensive study encompasses detection algorithms based on various mainstream sensors and includes an exploration of the latest advancements in this field. Subsequently, we perform a comprehensive statistical analysis and comparison of the performance and latency demonstrated by all the methods on widely recognized datasets. A summary of the current research status is presented, and prospects for future research directions are provided.
自动驾驶三维目标检测单目立体多视图激光雷达多模态
autonomous driving3D object detectionmonocularstereomulti-viewlight detection and ranging(LiDAR)multi-modal
Bai X Y, Hu Z Y, Zhu X G, Huang Q Q, Chen Y L, Fu H B and Tai C L. 2022. TransFusion: robust LiDAR-camera fusion for 3D object detection with Transformers//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 1080-1089 [DOI: 10.1109/CVPR52688.2022.00116http://dx.doi.org/10.1109/CVPR52688.2022.00116]
Brazil G and Liu X M. 2019. M3D-RPN: monocular 3D region proposal network for object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 9286-9295 [DOI: 10.1109/ICCV.2019.00938http://dx.doi.org/10.1109/ICCV.2019.00938]
Caesar H, Bankiti V, Lang A H, Vora S, Liong V E, Xu Q, Krishnan A, Pan Y, Baldan G and Beijbom O. 2020. nuScenes: a multimodal dataset for autonomous driving//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11618-11628 [DOI: 10.1109/CVPR42600.2020.01164http://dx.doi.org/10.1109/CVPR42600.2020.01164]
Caesar H, Kabzan J, Tan K S, Fong W K, Wolff E, Lang A, Fletcher L, Beijbom O and Omari S. 2022. NuPlan: a closed-loop ml-based planning benchmark for autonomous vehicles [EB/OL]. [2022-02-04]. https://arxiv.org/pdf/2106.11810.pdfhttps://arxiv.org/pdf/2106.11810.pdf
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A and Zagoruyko S. 2020. End-to-end object detection with Transformers//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 213-229 [DOI: 10.1007/978-3-030-58452-8_13http://dx.doi.org/10.1007/978-3-030-58452-8_13]
Chang M F, Lambert J, Sangkloy P, Singh J, Bak S, Hartnett A, Wang D, Carr P, Lucey S, Ramanan D and Hays J. 2019. Argoverse: 3D tracking and forecasting with rich maps//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 8740-8749 [DOI: 10.1109/CVPR.2019.00895http://dx.doi.org/10.1109/CVPR.2019.00895]
Chen D, Li J, Guizilini V, Ambruş R and Gaidon A. 2023a. Viewpoint equivariance for multi-view 3D object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 9213-9222 [DOI: 10.1109/CVPR52729.2023.00889http://dx.doi.org/10.1109/CVPR52729.2023.00889]
Chen Q, Sun L, Wang Z X, Jia K and Yuille A. 2020a. Object as hotspots: an anchor-free 3D object detection approach via firing of hotspots//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 68-84 [DOI: 10.1007/978-3-030-58589-1_5http://dx.doi.org/10.1007/978-3-030-58589-1_5]
Chen X Z, Ma H M, Wan J, Li B and Xia T. 2017. Multi-view 3D object detection network for autonomous driving//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6526-6534 [DOI: 10.1109/CVPR.2017.691http://dx.doi.org/10.1109/CVPR.2017.691]
Chen Y K, Liu J H, Zhang X Y, Qi X J and Jia J Y. 2023b. VoxelNeXt: fully sparse VoxelNet for 3D object detection and tracking//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 21674-21683 [DOI: 10.1109/CVPR52729.2023.02076http://dx.doi.org/10.1109/CVPR52729.2023.02076]
Chen Y L, Huang S J, Liu S, Yu B and Jia J Y. 2023c. DSGN++: exploiting visual-spatial relation for stereo-based 3D detectors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4): 4416-4429 [DOI: 10.1109/TPAMI.2022.3197236http://dx.doi.org/10.1109/TPAMI.2022.3197236]
Chen Y L, Liu S, Shen X Y and Jia J Y. 2020b. DSGN: deep stereo geometry network for 3D object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 12533-12542 [DOI: 10.1109/CVPR42600.2020.01255http://dx.doi.org/10.1109/CVPR42600.2020.01255]
Chi X W, Liu J M, Lu M, Zhang R Y, Wang Z Q, Guo Y D and Zhang S H. 2023. BEV-SAN: accurate BEV 3D object detection via slice attention networks//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 17461-17470 [DOI: 10.1109/CVPR52729.2023.01675http://dx.doi.org/10.1109/CVPR52729.2023.01675]
Cho H, Choi J, Baek G and Hwang W. 2023. itKD: interchange transfer-based knowledge distillation for 3D object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 13540-13549 [DOI: 10.1109/CVPR52729.2023.01301http://dx.doi.org/10.1109/CVPR52729.2023.01301]
Choi Y, Kim N, Hwang S, Park K, Yoon J S, An K and Kweon I S. 2018. KAIST multi-spectral day/night data set for autonomous and assisted driving. IEEE Transactions on Intelligent Transportation Systems, 19(3): 934-948 [DOI: 10.1109/TITS.2018.2791533http://dx.doi.org/10.1109/TITS.2018.2791533]
Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2009. ImageNet: a large-scale hierarchical image database// Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 248-255 [DOI: 10.1109/CVPR.2009.5206848http://dx.doi.org/10.1109/CVPR.2009.5206848]
Deng J J, Shi S S, Li P W, Zhou W G, Zhang Y Y and Li H Q. 2021. Voxel R-CNN: towards high performance voxel-based 3D object detection//Proceedings of the 35th AAAI Conference on Artificial Intelligence. Virtual Event: AAAI Press: 1201-1209 [DOI: 10.1609/aaai.v35i2.16207http://dx.doi.org/10.1609/aaai.v35i2.16207]
Diaz-Ruiz C A, Xia Y Y, You Y R, Nino J, Chen J N, Monica J, Chen X Y, Luo K T, Wang Y, Emond M, Chao W L, Hariharan B, Weinberger K Q and Campbell M. 2022. Ithaca365: dataset and driving perception under repeated and challenging weather conditions//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 21351-21360 [DOI: 10.1109/CVPR52688.2022.02069http://dx.doi.org/10.1109/CVPR52688.2022.02069]
Dosovitskiy A, Ros G, Codevilla F, Lopez A and Koltun V. 2017. CARLA: an open urban driving simulator [EB/OL]. [2023-11-07]. https://arxiv.org/pdf/1711.03938.pdfhttps://arxiv.org/pdf/1711.03938.pdf
Everingham M, Van Gool L, Williams C K I, Winn J and Zisserman A. 2010. The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2): 303-338 [DOI: 10.1007/s11263-009-0275-4http://dx.doi.org/10.1007/s11263-009-0275-4]
Feng C J, Jie Z Q, Zhong Y J, Chu X X and Ma L. 2023. AeDet: azimuth-invariant multi-view 3D object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 21580-21588 [DOI: 10.1109/CVPR52729.2023.02067http://dx.doi.org/10.1109/CVPR52729.2023.02067]
Gählert N, Jourdan N, Cordts M, Franke U and Denzler J. 2020. Cityscapes3D: dataset and benchmark for 9 DoF vehicle detection [EB/OL]. [2023-11-07]. https://arxiv.org/pdf/2006.07864.pdfhttps://arxiv.org/pdf/2006.07864.pdf
Ge C J, Chen J S, Xie E Z, Wang Z D, Hong L Q, Lu H C, Li Z G and Luo P. 2023. MetaBEV: solving sensor failures for BEV detection and map segmentation. arXiv preprint [EB/OL]. [2023-11-07]. https://arxiv.org/pdf/2304.09801.pdfhttps://arxiv.org/pdf/2304.09801.pdf
Gehrig M and Scaramuzza D. 2023. Recurrent vision Transformers for object detection with event cameras//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 13884-13893 [DOI: 10.1109/CVPR52729.2023.01334http://dx.doi.org/10.1109/CVPR52729.2023.01334]
Geiger A, Lenz P and Urtasun R. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 3354-3361 [DOI: 10.1109/CVPR.2012.6248074http://dx.doi.org/10.1109/CVPR.2012.6248074]
Geyer J, Kassahun Y, Mahmudi M, Ricou X, Durgesh R, Chung A S, Hauswald L, Pham V H, Mühlegg M, Dorn S, Fernandez T, Jänicke M, Mirashi S, Savani C, Sturm M, Vorobiov O, Oelker M, Garreis S and Schuberth P. 2020. A2D2: Audi autonomous driving dataset [EB/OL]. [2023-11-07]. https://arxiv.org/pdf/2004.06320.pdfhttps://arxiv.org/pdf/2004.06320.pdf
Guo X Y, Shi S S, Wang X G and Li H S. 2021a. LIGA-stereo: learning LiDAR geometry aware representations for stereo-based 3d detector//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 3133-3143 [DOI: 10.1109/ICCV48922.2021.00314http://dx.doi.org/10.1109/ICCV48922.2021.00314]
Guo Y L, Wang H Y, Hu Q Y, Liu H, Liu L and Bennamoun M. 2021b. Deep learning for 3D point clouds: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12): 4338-4364 [DOI: 10.1109/TPAMI.2020.3005434http://dx.doi.org/10.1109/TPAMI.2020.3005434]
Gupta A, Dollr P and Girshick R. 2019. LVIS: a dataset for large vocabulary instance segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5351-5359 [DOI: 10.1109/CVPR.2019.00550http://dx.doi.org/10.1109/CVPR.2019.00550]
He C H, Li R H, Zhang Y B, Li S and Zhang L. 2023a. MSF: motion-guided sequential fusion for efficient 3D object detection from point cloud sequences//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 5196-5205 [DOI: 10.1109/CVPR52729.2023.00503http://dx.doi.org/10.1109/CVPR52729.2023.00503]
He C H, Zeng H, Huang J Q, Hua X S and Zhang L. 2020. Structure aware single-stage 3D object detection from point cloud//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11870-11879 [DOI: 10.1109/CVPR42600.2020.01189http://dx.doi.org/10.1109/CVPR42600.2020.01189]
He J W, Chen Y T, Wang N Y and Zhang Z X. 2023b. 3D video object detection with learnable object-centric global optimization//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 5106-5115 [DOI: 10.1109/CVPR52729.2023.00494http://dx.doi.org/10.1109/CVPR52729.2023.00494]
Hu Q J, Liu D Z and Hu W. 2023. Density-insensitive unsupervised domain adaption on 3D object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 17556-17566 [DOI: 10.1109/CVPR52729.2023.01684http://dx.doi.org/10.1109/CVPR52729.2023.01684]
Huang J J, Huang G, Zhu Z, Ye Y and Du D. 2021. BEVDet: high-performance multi-camera 3D object detection in bird-eye-view [EB/OL]. [2023-11-07]. https://arxiv.org/pdf/2112.11790.pdfhttps://arxiv.org/pdf/2112.11790.pdf
Huang X Y, Wang P, Cheng X J, Zhou D F, Geng Q C and Yang R G. 2020. The ApolloScape open dataset for autonomous driving and its application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(10): 2702-2719 [DOI: 10.1109/TPAMI.2019.2926463http://dx.doi.org/10.1109/TPAMI.2019.2926463]
Jiao Y, Jie Z Q, Chen S X, Chen J J, Ma L and Jiang Y G. 2023. MSMDfusion: fusing LiDAR and camera at multiple scales with multi-depth seeds for 3D object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 21643-21652 [DOI: 10.1109/CVPR52729.2023.02073http://dx.doi.org/10.1109/CVPR52729.2023.02073]
Jin S, Li X P, Yang F and Zhang W G. 2023. 3D object detection in road scenes by pseudo-LiDAR point cloud augmentation. Journal of Image and Graphics, 28(11): 3520-3535
晋帅, 李煊鹏, 杨凤, 张为公. 2023. 伪激光点云增强的道路场景三维目标检测. 中国图象图形学报, 28(11): 3520-3535 [DOI: 10.11834/jig.220986http://dx.doi.org/10.11834/jig.220986]
Kesten R, Usman M, Houston J, Pandya T, Nadhamuni K, Ferreira A, Yuan M, Low B, Jain A, Ondruska P, Omari S, Shah S, Kulkarni A, Kazakova A, Tao C, Platinsky L, Jiang W and Shet V. 2019. Lyft level5 av dataset 2019. [EB/OL]. [2023-11-07]. https://woven.toyota/en/perception-dataset/https://woven.toyota/en/perception-dataset/
Klingner M, Borse S, Kumar V R, Rezaei B, Narayanan V, Yogamani S and Porikli F. 2023. X3KD: knowledge distillation across modalities, tasks and stages for multi-camera 3D object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 13343-13353 [DOI: 10.1109/CVPR52729.2023.01282http://dx.doi.org/10.1109/CVPR52729.2023.01282]
Ku J, Mozifian M, Lee J, Harakeh A and Waslander S L. 2018. Joint 3D proposal generation and object detection from view aggregation//Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, Spain: IEEE: #8594049 [DOI: 10.1109/IROS.2018.8594049http://dx.doi.org/10.1109/IROS.2018.8594049]
Ku J, Pon A D and Waslander S L. 2019. Monocular 3D object detection leveraging accurate proposals and shape reconstruction//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 11859-11868 [DOI: 10.1109/CVPR.2019.01214http://dx.doi.org/10.1109/CVPR.2019.01214]
Lang A H, Vora S, Caesar H, Zhou L B, Yang J and Beijbom O. 2019. PointPillars: fast encoders for object detection from point clouds//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 12689-12697 [DOI: 10.1109/CVPR.2019.01298http://dx.doi.org/10.1109/CVPR.2019.01298]
Li C Y, Ku J and Waslander S L. 2020a. Confidence guided stereo 3D object detection with split depth estimation//Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, USA: IEEE: 5776-5783 [DOI: 10.1109/IROS45743.2020.9341188http://dx.doi.org/10.1109/IROS45743.2020.9341188]
Li J N, Li J, Zhu L, Xiang X J, Huang T J and Tian Y H. 2022a. Asynchronous spatio-temporal memory network for continuous event-based object detection. IEEE Transactions on Image Processing, 31: 2975-2987 [DOI: 10.1109/TIP.2022.3162962http://dx.doi.org/10.1109/TIP.2022.3162962]
Li J Y, Luo C X and Yang X D. 2023a. PillarNeXt: rethinking network designs for 3D object detection in LiDAR point clouds//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 17567-17576 [DOI: 10.1109/CVPR52729.2023.01685http://dx.doi.org/10.1109/CVPR52729.2023.01685]
Li P L, Chen X Z and Shen S J. 2019. Stereo R-CNN based 3D object detection for autonomous driving//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7636-7644 [DOI: 10.1109/CVPR.2019.00783http://dx.doi.org/10.1109/CVPR.2019.00783]
Li P X, Zhao H C, Liu P F and Cao F D. 2020b. RTM3D: real-time monocular 3D detection from object keypoints for autonomous driving//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 644-660 [DOI: 10.1007/978-3-030-58580-8_38http://dx.doi.org/10.1007/978-3-030-58580-8_38]
Li X, Ma T, Hou Y N, Shi B T, Yang Y C, Liu Y Q, Wu X J, Chen Q, Li Y K, Qiao Y and He L. 2023b. LoGoNet: towards accurate 3D object detection with local-to-global cross-modal fusion//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 17524-17534 [DOI: 10.1109/CVPR52729.2023.01681http://dx.doi.org/10.1109/CVPR52729.2023.01681]
Li X Y, Ye Z H, Wei S K, Chen Z, Chen X T, Tian Y H, Dang J W, Fu S J and Zhao Y. 2023. 3D object detection for autonomous driving from image: a survey—benchmarks, constraints and error analysis. Journal of Image and Graphics, 28(6): 1709-1740
李熙莹, 叶芝桧, 韦世奎, 陈泽, 陈小彤, 田永鸿, 党建武, 付树军, 赵耀. 2023. 基于图像的自动驾驶3D目标检测综述——基准、制约因素和误差分析. 中国图象图形学报, 28(6): 1709-1740 [DOI: 10.11834/jig.230036http://dx.doi.org/10.11834/jig.230036]
Li Y H, Bao H, Ge Z, Yang J R, Sun J J and Li Z M. 2023c. BEVStereo: enhancing depth estimation in multi-view 3D object detection with temporal stereo//Proceedings of the 37th AAAI Conference on Artificial Intelligence. Washington, USA: AAAI Press: 1486-1494 [DOI: 10.1609/aaai.v37i2.25234http://dx.doi.org/10.1609/aaai.v37i2.25234]
Li Y H, Ge Z, Yu G Y, Yang J R, Wang Z R, Shi Y K, Sun J J and Li Z M. 2023d. BEVDepth: acquisition of reliable depth for multi-view 3D object detection//Proceedings of 2023 AAAI Conference on Artificial Intelligence. Washington, USA: AAAI Press: 1477-1485 [DOI: 10.1609/aaai.v37i2.25233http://dx.doi.org/10.1609/aaai.v37i2.25233]
Li Y M, Ma D K, An Z Y, Wang Z X, Zhong Y Q, Chen S H and Feng C. 2022b. V2X-Sim: multi-agent collaborative perception dataset and benchmark for autonomous driving. IEEE Robotics and Automation Letters, 7(4): 10914-10921 [DOI: 10.1109/LRA.2022.3192802http://dx.doi.org/10.1109/LRA.2022.3192802]
Li Y W, Qi C R, Zhou Y, Liu C X and Anguelov D. 2023e. MoDAR: using motion forecasting for 3D object detection in point cloud sequences//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 9329-9339 [DOI: 10.1109/CVPR52729.2023.00900http://dx.doi.org/10.1109/CVPR52729.2023.00900]
Li Z Q, Wang W H, Li H Y, Xie E Z, Sima C H, Lu T, Qiao Y and Dai J F. 2022c. BEVFormer: learning bird’s-eye-view representation from multi-camera images via spatiotemporal Transformers//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer: 1-18 [DOI: 10.1007/978-3-031-20077-9_1http://dx.doi.org/10.1007/978-3-031-20077-9_1]
Liang M, Yang B, Chen Y, Hu R and Urtasun R. 2019. Multi-task multi-sensor fusion for 3D object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7337-7345 [DOI: 10.1109/CVPR.2019.00752http://dx.doi.org/10.1109/CVPR.2019.00752]
Liang T T, Xie H W, Yu K C, Xia Z Y, Lin Z W, Wang Y T, Tang T, Wang B and Tang Z. 2022. BEVFusion: a simple and robust LiDAR-camera fusion framework [EB/OL]. [2023-11-07]. https://arxiv.org/pdf/2205.13790.pdfhttps://arxiv.org/pdf/2205.13790.pdf
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollr P, and Zitnick C L. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 740-755 [DOI: 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48]
Liu C D, Gao C Q, Liu F C, Li P C, Meng D Y and Gao X B. 2023a. Hierarchical supervision and shuffle data augmentation for 3D semi-supervised object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 23819-23828 [DOI: 10.1109/CVPR52729.2023.02281http://dx.doi.org/10.1109/CVPR52729.2023.02281]
Liu Y F, Wang T C, Zhang X Y and Sun J. 2022. PETR: position embedding transformation for multi-view 3D object detection//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer: 531-548 [DOI: 10.1007/978-3-031-19812-0_31http://dx.doi.org/10.1007/978-3-031-19812-0_31]
Liu Y F, Yan J J, Jia F, Li S L, Gao A Q, Wang T C and Zhang X Y. 2023b. PETRv2: a unified framework for 3D perception from multi-camera images//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE: 3239-3249 [DOI: 10.1109/ICCV51070.2023.00302http://dx.doi.org/10.1109/ICCV51070.2023.00302]
Lu Y, Ma X Z, Yang L, Zhang T Z, Liu Y T, Chu Q, Yan J J and Ouyang W L. 2021. Geometry uncertainty projection network for monocular 3D object detection//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 3091-3101 [DOI: 10.1109/ICCV48922.2021.00310http://dx.doi.org/10.1109/ICCV48922.2021.00310]
Ma X Z, Liu S N, Xia Z Y, Zhang H W, Zeng X Y and Ouyang W L. 2020. Rethinking pseudo-LiDAR representation//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 311-327 [DOI: 10.1007/978-3-030-58601-0_19http://dx.doi.org/10.1007/978-3-030-58601-0_19]
Ma X Z, Ouyang W L, Simonelli A and Ricci E. 2023. 3D object detection from images for autonomous driving: a survey [EB/OL]. [2023-12-30]. https://arxiv.org/pdf/2202.02980.pdfhttps://arxiv.org/pdf/2202.02980.pdf
Ma X Z, Wang Z H, Li H J, Zhang P B, Ouyang W L and Fan X. 2019. Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 6850-6859 [DOI: 10.1109/ICCV.2019.00695http://dx.doi.org/10.1109/ICCV.2019.00695]
Manivasagam S, Wang S L, Wong K, Zeng W Y, Sazanovich M, Tan S H, Yang B, Ma W C and Urtasun R. 2020. LiDARsim: realistic LiDAR simulation by leveraging the real world//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11164-11173 [DOI: 10.1109/CVPR42600.2020.01118http://dx.doi.org/10.1109/CVPR42600.2020.01118]
Mao J G, Niu M Z, Jiang C H, Liang H X, Chen J H, Liang X D, Li Y M, Ye C Q, Zhang W, Li Z G, Yu J, Xu H and Xu C J. 2021. One million scenes for autonomous driving: once dataset [EB/OL]. [2023-11-07]. https://arxiv.org/pdf/2106.11037.pdfhttps://arxiv.org/pdf/2106.11037.pdf
Mao J G, Shi S S, Wang X G and Li H S. 2023. 3D object detection for autonomous driving: a comprehensive survey. International Journal of Computer Vision, 131(8): 1909-1963 [DOI: 10.1007/s11263-023-01790-1http://dx.doi.org/10.1007/s11263-023-01790-1]
Ming Q, Miao L J, Ma Z, Zhao L, Zhou Z Q, Huang X H, Chen Y P and Guo Y F. 2023. Deep dive into gradients: better optimization for 3D object detection with gradient-corrected IoU supervision//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 5136-5145 [DOI: 10.1109/CVPR52729.2023.00497http://dx.doi.org/10.1109/CVPR52729.2023.00497]
Olaverri-Monreal C, Errea-Moreno J, Díaz-Álvarez A, Biurrun-Quel C, Serrano-Arriezu L and Kuba M. 2018. Connection of the SUMO microscopic traffic simulator and the unity 3D game engine to evaluate V2X communication-based systems. Sensors, 18(12): #4399 [DOI: 10.3390/s18124399http://dx.doi.org/10.3390/s18124399]
Pan X R, Xia Z F, Song S J, Li L E and Huang G. 2021. 3D object detection with pointformer//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 7459-7468 [DOI: 10.1109/CVPR46437.2021.00738http://dx.doi.org/10.1109/CVPR46437.2021.00738]
Pang S, Morris D and Radha H. 2020. CLOCs: camera-LiDAR object candidates fusion for 3D object detection//Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, USA: IEEE: 10386-10393 [DOI: 10.1109/IROS45743.2020.9341791http://dx.doi.org/10.1109/IROS45743.2020.9341791]
Pang S, Morris D and Radha H. 2022. Fast-CLOCs: fast camera-LiDAR object candidates fusion for 3D object detection//Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE: 3747-3756 [DOI: 10.1109/WACV51458.2022.00380http://dx.doi.org/10.1109/WACV51458.2022.00380]
Park D, Ambruş R, Guizilini V, Li J and Gaidon A. 2021. Is pseudo-lidar needed for monocular 3D object detection?//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 3122-3132 [DOI: 10.1109/ICCV48922.2021.00313http://dx.doi.org/10.1109/ICCV48922.2021.00313]
Patil A, Malla S, Gang H and Chen Y T. 2019. The H3D dataset for full-surround 3D multi-object detection and tracking in crowded urban scenes//Proceedings of 2019 International Conference on Robotics and Automation (ICRA). Montreal, Canada: IEEE: 9552-9557 [DOI: 10.1109/ICRA.2019.8793925http://dx.doi.org/10.1109/ICRA.2019.8793925]
Perot E, De Tournemire P, Nitti D, Masci J and Sironi A. 2020. Learning to detect objects with a 1 megapixel event camera [EB/OL]. [2023-11-07]. https://arxiv.org/pdf/2009.13436.pdfhttps://arxiv.org/pdf/2009.13436.pdf
Philion J and Fidler S. 2020. Lift, splat, shoot: encoding images from arbitrary camera rigs by implicitly unprojecting to 3D//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 194-210 [DOI: 10.1007/978-3-030-58568-6_12http://dx.doi.org/10.1007/978-3-030-58568-6_12]
Pon A D, Ku J, Li C Y and Waslander S L. 2020. Object-centric stereo matching for 3D object detection//Proceedings of 2020 IEEE International Conference on Robotics and Automation (ICRA). Paris, France: IEEE: 8383-8389 [DOI: 10.1109/ICRA40945.2020.9196660http://dx.doi.org/10.1109/ICRA40945.2020.9196660]
Qi C R, Liu W, Wu C X, Su H and Guibas L J. 2018. Frustum PointNets for 3D object detection from RGB-D data//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 918-927 [DOI: 10.1109/CVPR.2018.00102http://dx.doi.org/10.1109/CVPR.2018.00102]
Qi C R, Su H, Mo K and Guibas L J. 2017a. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 77-85 [DOI: 10.1109/CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16]
Qi C R, Yi L, Su H and Guibas L J. 2017b. PointNet++: deep hierarchical feature learning on point sets in a metric space [EB/OL]. [2023-11-07]. https://arxiv.org/pdf/1706.02413.pdfhttps://arxiv.org/pdf/1706.02413.pdf
Qian R, Garg D, Wang Y, You Y R, Belongie S, Hariharan B, Campbell M, Weinberger K Q and Chao W L. 2020. End-to-end pseudo-LiDAR for image-based 3D object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 5880-5889 [DOI: 10.1109/CVPR42600.2020.00592http://dx.doi.org/10.1109/CVPR42600.2020.00592]
Qian R, Lai X and Li X R. 2022. 3D object detection for autonomous driving: a survey. Pattern Recognition, 130: #108796 [DOI: 10.1016/j.patcog.2022.108796http://dx.doi.org/10.1016/j.patcog.2022.108796]
Qin Z Y, Wang J L and Lu Y. 2019. Triangulation learning network: from monocular to stereo 3D object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7607-7615 [DOI: 10.1109/CVPR.2019.00780http://dx.doi.org/10.1109/CVPR.2019.00780]
Richter S R, Hayder Z and Koltun V. 2017. Playing for benchmarks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2232-2241 [DOI: 10.1109/ICCV.2017.243http://dx.doi.org/10.1109/ICCV.2017.243]
Ros G, Sellart L, Materzynska J, Vazquez D and Lopez A M. 2016. The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 3234-3243 [DOI: 10.1109/CVPR.2016.352http://dx.doi.org/10.1109/CVPR.2016.352]
Sheng H L, Cai S J, Zhao N, Deng B, Huang J Q, Hua X S, Zhao M J and Lee G H. 2022. Rethinking IoU-based optimization for single-stage 3D object detection//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer: 544-561 [DOI: 10.1007/978-3-031-20077-9_32http://dx.doi.org/10.1007/978-3-031-20077-9_32]
Shi G S, Li R F and Ma C. 2022a. PillarNet: real-time and high-performance pillar-based 3D object detection//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer: 35-52 [DOI: 10.1007/978-3-031-20080-9_3http://dx.doi.org/10.1007/978-3-031-20080-9_3]
Shi S S, Guo C X, Jiang L, Wang Z, Shi J P, Wang X G and Li H S. 2020. PV-RCNN: point-voxel feature set abstraction for 3D object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10526-10535 [DOI: 10.1109/CVPR42600.2020.01054http://dx.doi.org/10.1109/CVPR42600.2020.01054]
Shi S S, Jiang L, Deng J J, Wang Z, Guo C X, Shi J P, Wang X G and Li H S. 2022b. PV-RCNN++: point-voxel feature set abstraction with local vector representation for 3D object detection [EB/OL]. [2023-11-07]. https://arxiv.org/pdf/2102.00463.pdfhttps://arxiv.org/pdf/2102.00463.pdf
Shi S S, Wang X G and Li H S. 2019. PointRCNN: 3D object proposal generation and detection from point cloud//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 770-779 [DOI: 10.1109/CVPR.2019.00086http://dx.doi.org/10.1109/CVPR.2019.00086]
Simon M, Amende K, Kraus A, Honer J, Samann T, Kaulbersch H, Milz S and Gross H M. 2019. Complexer-YOLO: real-time 3D object detection and tracking on semantic point clouds//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Long Beach, USA: IEEE: 1190-1199 [DOI: 10.1109/CVPRW.2019.00158http://dx.doi.org/10.1109/CVPRW.2019.00158]
Simony M, Milzy S, Amendey K and Gross H M. 2018. Complex-YOLO: an Euler-region-proposal for real-time 3D object detection on point clouds//Proceedings of 2018 European Conference on Computer Vision (ECCV) Workshops. Munich, Germany: Springer: 197-209 [DOI: 10.1007/978-3-030-11009-3_11http://dx.doi.org/10.1007/978-3-030-11009-3_11]
Sun J M, Chen L H, Xie Y M, Zhang S Y, Jiang Q H, Zhou X W and Bao H J. 2020a. Disp R-CNN: stereo 3D object detection via shape prior guided instance disparity estimation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10545-10554 [DOI: 10.1109/CVPR42600.2020.01056http://dx.doi.org/10.1109/CVPR42600.2020.01056]
Sun P, Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, Zhou Y, Chai Y N, Caine B, Vasudevan V, Han W, Ngiam J, Zhao H, Timofeev A, Ettinger S, Krivokon M, Gao A, Joshi A, Zhang Y, Shlens J, Chen Z F and Anguelov D. 2020b. Scalability in perception for autonomous driving: Waymo Open dataset//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2443-2451 [DOI: 10.1109/CVPR42600.2020.00252http://dx.doi.org/10.1109/CVPR42600.2020.00252]
Sun T, Segu M, Postels J, Wang Y X, Van Gool L, Schiele B, Tombari F and Yu F. 2022. SHIFT: a synthetic driving dataset for continuous multi-task domain adaptation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 21339-21350 [DOI: 10.1109/CVPR52688.2022.02068http://dx.doi.org/10.1109/CVPR52688.2022.02068]
Tao R Z, Han W C, Qiu Z Y, Xu C Z and Shen J B. 2023. Weakly supervised monocular 3D object detection using multi-view projection and direction consistency//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 17482-17492 [DOI: 10.1109/CVPR52729.2023.01677http://dx.doi.org/10.1109/CVPR52729.2023.01677]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2023. Attention is all you need [EB/OL]. [2023-08-02]. https://arxiv.org/pdf/1706.03762.pdfhttps://arxiv.org/pdf/1706.03762.pdf
Vora S, Lang A H, Helou B and Beijbom O. 2020. PointPainting: sequential fusion for 3D object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4603-4611 [DOI: 10.1109/CVPR42600.2020.00466http://dx.doi.org/10.1109/CVPR42600.2020.00466]
Wang C W, Ma C, Zhu M and Yang X K. 2021a. PointAugmenting: cross-modal augmentation for 3D object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 11789-11798 [DOI: 10.1109/CVPR46437.2021.01162http://dx.doi.org/10.1109/CVPR46437.2021.01162]
Wang H Y, Shi C, Shi S S, Lei M, Wang S, He D, Schiele B and Wang L W. 2023a. DSVT: dynamic sparse voxel Transformer with rotated sets//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 13520-13529 [DOI: 10.1109/CVPR52729.2023.01299http://dx.doi.org/10.1109/CVPR52729.2023.01299]
Wang S, Zhao X H, Xu H M, Chen Z H, Yu D M, Chang J H, Yang Z and Zhao F. 2023b. Towards domain generalization for multi-view 3D object detection in bird-eye-view//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 13333-13342 [DOI: 10.1109/CVPR52729.2023.01281http://dx.doi.org/10.1109/CVPR52729.2023.01281]
Wang S H, Liu Y F, Wang T, Li Y and Zhang X Y. 2023c. Exploring object-centric temporal modeling for efficient multi-view 3D object detection [EB/OL]. [2023-06-07]. https://arxiv.org/pdf/2303.11926.pdfhttps://arxiv.org/pdf/2303.11926.pdf
Wang T, Zhu X G, Pang J M and Lin D H. 2021b. FCOS3D: fully convolutional one-stage monocular 3D object detection//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision Workshops. Montreal, Canada: IEEE: 913-922 [DOI: 10.1109/ICCVW54120.2021.00107http://dx.doi.org/10.1109/ICCVW54120.2021.00107]
Wang T, Zhu X G, Pang J M and Lin D H. 2022a. Probabilistic and geometric depth: detecting objects in perspective//Proceedings of the 5th Conference on Robot Learning. London, UK: [s.n.]: 1475-1485
Wang T H, Manivasagam S, Liang M, Yang B, Zeng W Y and Urtasun R. 2020. V2VNet: vehicle-to-vehicle communication for joint perception and prediction//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 605-621 [DOI: 10.1007/978-3-030-58536-5_36http://dx.doi.org/10.1007/978-3-030-58536-5_36]
Wang Y, Chao W L, Garg D, Hariharan B, Campbell M and Weinberger K Q. 2019a. Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 8437-8445 [DOI: 10.1109/CVPR.2019.00864http://dx.doi.org/10.1109/CVPR.2019.00864]
Wang Y, Guizilini V, Zhang T Y, Wang Y L, Zhao H and Solomon J. 2022b. DETR3D: 3D object detection from multi-view images via 3D-to-2D queries [EB/OL]. [2023-11-07]. https://arxiv.org/pdf/2110.06922.pdfhttps://arxiv.org/pdf/2110.06922.pdf
Wang Y, Yang B, Hu R, Liang M and Urtasun R. 2021c. PLUMENet: efficient 3D object detection from stereo images//Proceedings of 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Prague, Czech Republic: IEEE: 3383-3390 [DOI: 10.1109/IROS51168.2021.9635875http://dx.doi.org/10.1109/IROS51168.2021.9635875]
Wang Y J, Deng J J, Li Y, Hu J S, Liu C, Zhang Y, Ji J M, Ouyang W L and Zhang Y Y. 2023d. Bi-LRFusion: bi-directional LiDAR-radar fusion for 3D dynamic object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 13394-13403 [DOI: 10.1109/CVPR52729.2023.01287http://dx.doi.org/10.1109/CVPR52729.2023.01287]
Wang Z X and Jia K. 2019b. Frustum ConvNet: sliding frustums to aggregate local point-wise features for Amodal 3D object detection//Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Macau, China: IEEE: 1742-1749 [DOI: 10.1109/IROS40897.2019.8968513http://dx.doi.org/10.1109/IROS40897.2019.8968513]
Weng X S, Man Y Z, Cheng D Z, Park J, O’Toole M and Kitani K. 2020. All-in-one drive:a large-scale comprehensive perception dataset with high-density long-range point clouds. [DOI: 10.13140/RG.2.2.21621.81122http://dx.doi.org/10.13140/RG.2.2.21621.81122]
Wilson B, Qi W, Agarwal T, Lambert J, Singh J, Khandelwal S, Pan B W, Kumar R, Hartnett A, Pontes J K, Ramanan D, Carr P and Hays J. 2023. Argoverse 2: next generation datasets for self-driving perception and forecasting [EB/OL]. [2023-01-02]. https://arxiv.org/pdf/2301.00493.pdfhttps://arxiv.org/pdf/2301.00493.pdf
Wu H, Wen C L, Shi S S, Li X and Wang C. 2023a. Virtual sparse convolution for multimodal 3D object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Montreal, Canada: IEEE: 21653-21662 [DOI: 10.1109/CVPR52729.2023.02074http://dx.doi.org/10.1109/CVPR52729.2023.02074]
Wu W H, Wong H S and Wu S. 2023b. Semi-supervised stereo-based 3D object detection via cross-view consensus//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 17471-17481 [DOI: 10.1109/CVPR52729.2023.01676http://dx.doi.org/10.1109/CVPR52729.2023.01676]
Xiao P C, Shao Z L, Hao S, Zhang Z S, Chai X L, Jiao J, Li Z S, Wu J, Sun K, Jiang K, Wang Y L and Yang D G. 2021. PandaSet: advanced sensor suite dataset for autonomous driving//Proceedings of 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). Indianapolis, USA: IEEE: 3095-3101 [DOI: 10.1109/ITSC48978.2021.9565009http://dx.doi.org/10.1109/ITSC48978.2021.9565009]
Xie Y C, Xu C F, Rakotosaona M J, Rim P, Tombari F, Keutzer K, Tomizuka M and Zhan W. 2023. SparseFusion: fusing multi-modal sparse representations for multi-sensor 3D object detection [EB/OL]. [2023-04-27]. https://arxiv.org/pdf/2304.14340.pdfhttps://arxiv.org/pdf/2304.14340.pdf
Xiong K X, Gong S, Ye X Q, Tan X, Wan J, Ding E R, Wang J D and Bai X. 2023. CAPE: camera view position embedding for multi-view 3D object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 21570-21579 [DOI: 10.1109/CVPR52729.2023.02066http://dx.doi.org/10.1109/CVPR52729.2023.02066]
Xu B and Chen Z Z. 2018. Multi-level fusion based 3D object detection from monocular images//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2345-2353 [DOI: 10.1109/CVPR.2018.00249http://dx.doi.org/10.1109/CVPR.2018.00249]
Xu R S, Guo Y, Han X, Xia X, Xiang H and Ma J Q. 2021a. OpenCDA: an open cooperative driving automation framework integrated with co-simulation//Proceedings of 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). Indianapolis, USA: IEEE: 1155-1162 [DOI: 10.1109/ITSC48978.2021.9564825http://dx.doi.org/10.1109/ITSC48978.2021.9564825]
Xu R S, Xiang H, Xia X, Han X, Li J and Ma J Q. 2022. OPV2V: an open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication//Proceedings of 2022 International Conference on Robotics and Automation (ICRA). Philadelphia, USA: IEEE: 2583-2589 [DOI: 10.1109/ICRA46639.2022.9812038http://dx.doi.org/10.1109/ICRA46639.2022.9812038]
Xu S Q, Zhou D F, Fang J, Yin J B, Bin Z and Zhang L J. 2021b. FusionPainting: multimodal fusion with adaptive attention for 3D object detection//Proceedings of 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). Indianapolis, USA: IEEE: 3047-3054 [DOI: 10.1109/ITSC48978.2021.9564951http://dx.doi.org/10.1109/ITSC48978.2021.9564951]
Xu Z B, Zhang W, Ye X Q, Tan X, Yang W, Wen S L, Ding E R, Meng A J and Huang L S. 2020. ZoomNet: part-aware adaptive zooming neural network for 3D object detection//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York,USA: AAAI Press: 12557-12564 [DOI: 10.1609/aaai.v34i07.6945http://dx.doi.org/10.1609/aaai.v34i07.6945]
Yan J J, Liu Y F, Sun J J, Jia F, Li S L, Wang T C and Zhang X Y. 2023. Cross modal Transformer: towards fast and robust 3D object detection//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE: 18222-18232 [DOI: 10.1109/ICCV51070.2023.01675http://dx.doi.org/10.1109/ICCV51070.2023.01675]
Yan Y, Mao Y X and Li B. 2018. SECOND: sparsely embedded convolutional detection. Sensors, 18(10): #3337 [DOI: 10.3390/s18103337http://dx.doi.org/10.3390/s18103337]
Yang B, Luo W and Urtasun R. 2018. PIXOR: real-time 3D object detection from point clouds//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7652-7660 [DOI: 10.1109/CVPR.2018.00798http://dx.doi.org/10.1109/CVPR.2018.00798]
Yang C Y, Chen Y T, Tian H, Tao C X, Zhu X Z, Zhang Z X, Huang G, Li H Y, Qiao Y, Lu L W, Zhou J and Dai J F. 2023a. BEVFormer v2: adapting modern image backbones to bird's-eye-view recognition via perspective supervision//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 17830-17839 [DOI: 10.1109/CVPR52729.2023.01710http://dx.doi.org/10.1109/CVPR52729.2023.01710]
Yang L, Yu K C, Tang T, Li J, Yuan K, Wang L, Zhang X Y and Chen P. 2023b. BEVHeight: a robust framework for vision-based roadside 3D object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 21611-21620 [DOI: 10.1109/CVPR52729.2023.02070http://dx.doi.org/10.1109/CVPR52729.2023.02070]
Yang Z T, Sun Y N, Liu S and Jia J Y. 2020. 3DSSD: point-based 3D single stage object detector//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11037-11045 [DOI: 10.1109/CVPR42600.2020.01105http://dx.doi.org/10.1109/CVPR42600.2020.01105]
Ye X Q, Shu M, Li H Y, Shi Y F, Li Y Y, Wang G J, Tan X and Ding E R. 2022. Rope3D: the roadside perception dataset for autonomous driving and monocular 3D object detection task//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 21309-21318 [DOI: 10.1109/CVPR52688.2022.02065http://dx.doi.org/10.1109/CVPR52688.2022.02065]
Yin T W, Zhou X Y and Krahenbuhl P. 2021. Center-based 3D object detection and tracking//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 11779-11788 [DOI: 10.1109/CVPR46437.2021.01161http://dx.doi.org/10.1109/CVPR46437.2021.01161]
Yoo J H, Kim Y, Kim J and Choi J W. 2020. 3D-CVF: generating joint camera and LiDAR features using cross-view spatial feature fusion for 3D object detection//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 720-736 [DOI: 10.1007/978-3-030-58583-9_43http://dx.doi.org/10.1007/978-3-030-58583-9_43]
You Y R, Wang Y, Chao W L, Garg D, Pleiss G, Hariharan B, Campbell M and Weinberger K Q. 2020. Pseudo-LiDAR++: accurate depth for 3D object detection in autonomous driving [EB/OL]. [2023-02-15]. https://arxiv.org/pdf/1906.06310.pdfhttps://arxiv.org/pdf/1906.06310.pdf
Yu H B, Luo Y Z, Shu M, Huo Y Y, Yang Z B, Shi Y F, Guo Z L, Li H Y, Hu X, Yuan J R and Nie Z Q. 2022. DAIR-V2X: a large-scale dataset for vehicle-infrastructure cooperative 3D object detection//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 21329-21338 [DOI: 10.1109/CVPR52688.2022.02067http://dx.doi.org/10.1109/CVPR52688.2022.02067]
Yuan J K, Zhang B, Yan X C, Chen T, Shi B T, Li Y K and Qiao Y. 2023. Bi3D: bi-domain active learning for cross-domain 3D object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 15599-15608 [DOI: 10.1109/CVPR52729.2023.01497http://dx.doi.org/10.1109/CVPR52729.2023.01497]
Zhang B, Yuan J K, Shi B T, Chen T, Li Y K and Qiao Y. 2023a. Uni3D: a unified baseline for multi-dataset 3D object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 9253-9262 [DOI: 10.1109/CVPR52729.2023.00893http://dx.doi.org/10.1109/CVPR52729.2023.00893]
Zhang J Q, Zhang Y, Liu Q J and Wang Y H. 2023b. SA-BEV: generating semantic-aware bird’s-eye-view feature for multi-view 3D object detection//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE: 3325-3334 [DOI: 10.1109/ICCV51070.2023.00310http://dx.doi.org/10.1109/ICCV51070.2023.00310]
Zhang Y P, Lu J W and Zhou J. 2021. Objects are different: flexible monocular 3D object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 3288-3297 [DOI: 10.1109/CVPR46437.2021.00330http://dx.doi.org/10.1109/CVPR46437.2021.00330]
Zheng W, Tang W L, Chen S J, Jiang L and Fu C W. 2021. CIA-SSD: confident IoU-aware single-stage object detector from point cloud//Proceedings of the 35th AAAI Conference on Artificial Intelligence. Virtual Event: AAAI Press: 3555-3562 [DOI: 10.1609/aaai.v35i4.16470http://dx.doi.org/10.1609/aaai.v35i4.16470]
Zhou C, Zhang Y N, Chen J X and Huang D. 2023a. OcTr: Octree-based Transformer for 3D object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 5166-5175 [DOI: 10.1109/CVPR52729.2023.00500http://dx.doi.org/10.1109/CVPR52729.2023.00500]
Zhou S C, Liu W Z, Hu C, Zhou S C and Ma C. 2023b. UniDistill: a universal cross-modality knowledge distillation framework for 3D object detection in bird’s-eye view//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 5116-5125 [DOI: 10.1109/CVPR52729.2023.00495http://dx.doi.org/10.1109/CVPR52729.2023.00495]
Zhou Y and Tuzel O. 2018. VoxelNet: end-to-end learning for point cloud based 3D object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4490-4499 [DOI: 10.1109/CVPR.2018.00472http://dx.doi.org/10.1109/CVPR.2018.00472]
Zhou Y S, Zhu H Z, Liu Q, Chang S and Guo M Y. 2023c. MonoATT: online monocular 3D object detection with adaptive token Transformer//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 17493-17503 [DOI: 10.1109/CVPR52729.2023.01678http://dx.doi.org/10.1109/CVPR52729.2023.01678]
Zhu B J, Wang Z, Shi S S, Xu H, Hong L Q and Li H S. 2023. ConQueR: query contrast voxel-DETR for 3D object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 9296-9305 [DOI: 10.1109/CVPR52729.2023.00897http://dx.doi.org/10.1109/CVPR52729.2023.00897]
相关作者
相关机构