融合点云深度信息的3D目标检测与分类
3D object detection and classification combined with point cloud depth information
- 2024年29卷第8期 页码:2399-2412
纸质出版日期: 2024-08-16
DOI: 10.11834/jig.230568
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-08-16 ,
移动端阅览
周昊, 齐洪钢, 邓永强, 李娟娟, 梁浩, 苗军. 2024. 融合点云深度信息的3D目标检测与分类. 中国图象图形学报, 29(08):2399-2412
Zhou Hao, Qi Honggang, Deng Yongqiang, Li Juanjuan, Liang Hao, Miao Jun. 2024. 3D object detection and classification combined with point cloud depth information. Journal of Image and Graphics, 29(08):2399-2412
目的
2
基于点云的3D目标检测是自动驾驶领域的重要技术之一。由于点云的非结构化特性,通常将点云进行体素化处理,然后基于体素特征完成3D目标检测任务。在基于体素的3D目标检测算法中,对点云进行体素化时会导致部分点云的数据信息和结构信息的损失,降低检测效果。针对该问题,本文提出一种融合点云深度信息的方法,有效提高了3D目标检测的精度。
方法
2
首先将点云通过球面投影的方法转换为深度图像,然后将深度图像与3D目标检测算法提取的特征图进行融合,从而对损失信息进行补全。由于此时的融合特征以2D伪图像的形式表示,因此使用YOLOv7(you only look once v7)中的主干网络提取融合特征。最后设计回归与分类网络,将提取到的融合特征送入到网络中预测目标的位置、大小以及类别。
结果
2
本文方法在KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago)数据集和DAIR-V2X数据集上进行测试。以AP(average precision )值为评价指标,在KITTI数据集上,改进算法PP-Depth相较于PointPillars在汽车、行人和自行车类别上分别有0.84%、2.3%和1.77%的提升。以自行车简单难度为例,改进算法PP-YOLO-Depth相较于PointPillars、PP-YOLO和PP-Depth分别有5.15%、1.1%和2.75%的提升。在DAIR-V2X数据集上,PP-Depth相较于PointPillars在汽车、行人和自行车类别上分别有17.46%、20.72%和12.7%的提升。以汽车简单难度为例,PP-YOLO-Depth相较于PointPillars、PP-YOLO和PP-Depth分别有13.53%、5.59%和1.08%的提升。
结论
2
本文方法在KITTI数据集和DAIR-V2X数据集上都取得了较好表现,减少了点云在体素化过程中的信息损失并提高了网络对融合特征的提取能力和多尺度目标的检测性能,使目标检测结果更加准确。
Objective
2
Perception systems are integral components in modern autonomous driving systems. They are designed to accurately estimate the state of the surrounding environment and provide reliable observations for prediction and planning. 3D object detection can intelligently predict the location, size, and category of key 3D objects near the autonomous vehicle, and it is an important part of the perception system. In 3D object detection, common data types include images and point clouds. Compared with images, a point cloud is a dataset composed of many points in a 3D space, and the position information of each point is represented by coordinates in a 3D coordinate system. In addition to position information, information such as reflection intensity is usually included. In the field of computer vision, point clouds are often used to represent the shape and structure information of 3D objects. Therefore, the 3D object detection method based on point cloud has more real spatial information and often has more advantages in detection accuracy and speed. However, the point cloud is often converted into a 3D voxel grid due to the unstructured nature of the point cloud. Each voxel in the voxel grid is regarded as a 3D feature vector. Then, the 3D convolutional network is used to extract the feature of the voxel, which completes the 3D object detection task based on the voxel feature. In the voxel-based 3D object detection algorithm, the voxelization of the point cloud will lead to the loss of data information and structural information of part of the point cloud, which affects the detection effect. We propose a method that combines point cloud depth information to solve this problem. Our method uses point cloud depth information as fusion information to complement the information lost in the voxelization process. It also uses the efficient YOLOv7-Net network to extract fusion features, improve the detection performance and feature extraction capabilities of multi-scale objects, and effectively increase the accuracy of 3D object detection.
Method
2
The point cloud is first converted into a depth image through spherical projection to reduce the information loss of the point cloud during the voxelization process. The depth image refers to a grayscale image generated through the point cloud, which reflects the distance from each point to the origin of the coordinate system in 3D space. Then, the pixel gray value is used to represent the depth information of the point cloud. Therefore, the depth image of the point cloud can provide a rich feature representation for the point cloud, and the depth information of the point cloud can be used as fusion information to complement the information lost in the voxelization process. Thereafter, the depth image is fused with the feature map extracted by the 3D object detection algorithm to complement the information lost in the voxelization process. Given that the fusion features at this time are more in the form of pseudo-images, a more efficient backbone feature extraction network is selected to extract fusion features. The backbone feature extraction network in YOLOv7 uses an adaptive convolution module, which can adaptively adjust the size of the convolution kernel and the size of the receptive field according to the scale. This way improves the detection performance of the network for multi-scale objects. At the same time, the feature fusion module and feature pyramid pooling module of YOLOv7-Net further enhance the feature extraction ability and detection performance of the network. Therefore, we choose to use YOLOv7-Net to extract fusion features. Finally, the classification and regression network is designed, and the extracted fusion features are sent to the classification and regression network to predict the category, position, and size of the object.
Result
2
Our method is tested on the Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago (KITTI) 3D object detection dataset and the DAIR-V2X Object Detection Dataset. Using average precision (AP) as the evaluation index, PP-Depth has improvements of 0.84%, 2.3%, and 1.77% in the categories of cars, pedestrians, and bicycles on the KITTI dataset compared with PointPillars. Using the simple difficulty of bicycles as an example, PP-YOLO-Depth has improvements of 5.15%, 1.1%, and 2.75% compared with PointPillars, PP-YOLO, and PP-Depth, respectively. On the DAIR-V2X dataset, PP-Depth has improvements of 17.46%, 20.72%, and 12.7% in the categories of cars, pedestrians, and bicycles compared with PointPillars. Using the simple difficulty of cars as an example, PP-YOLO-Depth has improvements of 13.53%, 5.59%, and 1.08% compared with PointPillars, PP-YOLO, and PP-Depth, respectively.
Conclusion
2
Experimental results show that our method achieves good performance on the KITTI 3D object detection dataset and the DAIR-V2X Object Detection Dataset. It reduces the information loss of the point cloud during the voxelization process and improves the ability of the network to extract fusion features and multi-scale object detection performance. Thus, it obtains more accurate object detection results.
自动驾驶3D点云目标检测深度信息融合点云体素化KITTI数据集
autonomous driving3D point cloud object detectiondeep information fusionpoint cloud voxelizationKITTI dataset
Charles R Q, Su H, Kaichun M and Guibas L J. 2017. Pointnet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 77-85 [DOI: 10.1109/CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16]
Chen X Z, Ma H M, Wan J, Li B and Xia T. 2017. Multi-view 3d object detection network for autonomous driving//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6526-6534 [DOI: 10.1109/CVPR.2017.691http://dx.doi.org/10.1109/CVPR.2017.691]
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S and Schiele B. 2016. The Cityscapes dataset for semantic urban scene understanding//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 3213-3223 [DOI: 10.1109/CVPR.2016.350http://dx.doi.org/10.1109/CVPR.2016.350]
Cui Y D, Chen R, Chu W B, Chen L, Tian D X, Li Y and Cao D P. 2022. Deep learning for image and point cloud fusion in autonomous driving: a review. IEEE Transactions on Intelligent Transportation Systems, 23(2): 722-739 [DOI: 10.1109/TITS.2020.3023541http://dx.doi.org/10.1109/TITS.2020.3023541]
Geiger A, Lenz P, Stiller C and Urtasun R. 2013. Vision meets robotics: the KITTI dataset. The International Journal of Robotics Research, 32(11): 1231-1237 [DOI: 10.1177/0278364913491297http://dx.doi.org/10.1177/0278364913491297]
Girshick R. 2015. Fast R-CNN//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1440-1448 [DOI: 10.1109/ICCV.2015.169http://dx.doi.org/10.1109/ICCV.2015.169]
Gong J Y, Lou Y J, Liu F Q, Zhang Z W, Chen H M, Zhang Z Z, Tan X, Xie Y and Ma L Z. 2023. Scene point cloud understanding and reconstruction technologies in 3D space. Journal of Image and Graphics, 28(6): 1741-1766
龚靖渝, 楼雨京, 柳奉奇, 张志伟, 陈豪明, 张志忠, 谭鑫, 谢源, 马利庄. 2023. 三维场景点云理解与重建技术. 中国图象图形学报, 28(6): 1741-1766 [DOI: 10.11834/jig.230004http://dx.doi.org/10.11834/jig.230004]
Guo Y L, Wang H Y, Hu Q Y, Liu H, Liu L and Bennamoun M. 2021. Deep learning for 3D point clouds: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12): 4338-4364 [DOI: 10.1109/TPAMI.2020.3005434http://dx.doi.org/10.1109/TPAMI.2020.3005434]
Ku J, Mozifian M, Lee J, Harakeh A and Waslander S L. 2018. Joint 3D proposal generation and object detection from view aggregation//Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid, Spain: IEEE: 1-8 [DOI: 10.1109/IROS.2018.8594049http://dx.doi.org/10.1109/IROS.2018.8594049]
Lang A H, Vora S, Caesar H, Zhou L B, Yang J and Beijbom O. 2019. PointPillars: fast encoders for object detection from point clouds//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 12689-12697 [DOI: 10.1109/CVPR.2019.01298http://dx.doi.org/10.1109/CVPR.2019.01298]
Li X Y, Ye Z H, Wei S K, Chen Z, Chen X T, Tian Y H, Dang J W, Fu S J and Zhao Y. 2023. 3D object detection for autonomous driving from image: a survey——benchmarks, constraints and error analysis. Journal of Image and Graphics, 28(6): 1709-1740
李熙莹, 叶芝桧, 韦世奎, 陈泽, 陈小彤, 田永鸿, 党建武, 付树军, 赵耀. 2023. 基于图像的自动驾驶3D目标检测综述——基准、制约因素和误差分析. 中国图象图形学报, 28(6): 1709-1740 [DOI: 10.11834/jig.230036http://dx.doi.org/10.11834/jig.230036]
Lin T Y, Goyal P, Girshick R, He K M and Dollr P. 2020. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2): 318-327 [DOI: 10.1109/TPAMI.2018.2858826http://dx.doi.org/10.1109/TPAMI.2018.2858826]
Nikolovski G, Reke M, Elsen I and Schiffer S. 2021. Machine learning based 3D object detection for navigation in unstructured environments//IEEE Intelligent Vehicles Symposium Workshops. Nagoya, Japan: IEEE: 236-242 [DOI: 10.1109/IVWorkshops54471.2021.9669218http://dx.doi.org/10.1109/IVWorkshops54471.2021.9669218]
Pan F and Bao H. 2021. Research progress of automatic driving control technology based on reinforcement learning. Journal of Image and Graphics, 26(1): 28-35
潘峰, 鲍泓. 2021. 强化学习的自动驾驶控制技术研究进展. 中国图象图形学报, 26(1): 28-35 [DOI: 10.11834/jig.200428http://dx.doi.org/10.11834/jig.200428]
Qi C R, Yi L, Su H and Guibas L J. 2017. Pointnet++: deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM: 5105-5114 [DOI: 10.5555/3295222.3295263http://dx.doi.org/10.5555/3295222.3295263]
Qian R, Lai X and Li X R. 2022. 3D object detection for autonomous driving: a survey. Pattern Recognition, 130: #108796 [DOI: 10.1016/j.patcog.2022.108796http://dx.doi.org/10.1016/j.patcog.2022.108796]
Rukhovich D, Vorontsova A and Konushin A. 2022. ImVoxelNet: image to voxels projection for monocular and multi-view general-purpose 3D object detection//Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE: 1265-1274 [DOI: 10.1109/WACV51458.2022.00133http://dx.doi.org/10.1109/WACV51458.2022.00133]
Shao Y H, Zhang D, Chu H Y, Zhang X Q and Rao Y B. 2022. A review of YOLO object detection based on deep learning. Journal of Electronics and Information Technology, 44(10): 3697-3708
邵延华, 张铎, 楚红雨, 张晓强, 饶云波. 2022. 基于深度学习的YOLO目标检测综述. 电子与信息学报, 44(10): 3697-3708 [DOI: 10.11999/JEIT210790http://dx.doi.org/10.11999/JEIT210790]
Shi S S, Wang X G and Li H S. 2019. PointRCNN: 3D object proposal generation and detection from point cloud//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 770-779 [DOI: 10.1109/CVPR.2019.00086http://dx.doi.org/10.1109/CVPR.2019.00086]
Sindagi V A, Zhou Y and Tuzel O. 2019. MVX-Net: multimodal VoxelNet for 3D object detection//Proceedings of 2019 International Conference on Robotics and Automation. Montreal, Canada: IEEE: 7276-7282 [DOI: 10.1109/ICRA.2019.8794195http://dx.doi.org/10.1109/ICRA.2019.8794195]
Tao S B, Liang C, Jiang T P, Yang Y J and Wang Y J. 2021. Sparse voxel pyramid neighborhood construction and classification of LiDAR point cloud. Journal of Image and Graphics, 26(11): 2703-2712
陶帅兵, 梁冲, 蒋腾平, 杨玉娇, 王永君. 2021. 激光点云的稀疏体素金字塔邻域构建与分类. 中国图象图形学报, 26(11): 2703-2712 [DOI: 10.11834/jig.200262http://dx.doi.org/10.11834/jig.200262]
Vora S, Lang A H, Helou B and Beijbom O. 2020. PointPainting: sequential fusion for 3D object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4603-4611 [DOI: 10.1109/CVPR42600.2020.00466http://dx.doi.org/10.1109/CVPR42600.2020.00466]
Wang C Y, Bochkovskiy A and Liao H Y M. 2023. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 7464-7475 [DOI: 10.1109/CVPR52729.2023.00721http://dx.doi.org/10.1109/CVPR52729.2023.00721]
Xu X K, Ma Y, Qian X and Zhang Y. 2021. Scale-aware EfficientDet: real-time pedestrian detection algorithm for automated driving. Journal of Image and Graphics, 26(1): 93-100
徐歆恺, 马岩, 钱旭, 张龑. 2021. 自动驾驶场景的尺度感知实时行人检测. 中国图象图形学报, 26(1): 93-100 [DOI: 10.11834/jig.200445http://dx.doi.org/10.11834/jig.200445]
Yu H B, Luo Y Z, Shu M, Huo Y Y, Yang Z B, Shi Y F, Guo Z L, Li H Y, Hu X, Yuan J R and Nie Z Q. 2022. DAIR-V2X: a large-scale dataset for vehicle-infrastructure cooperative 3d object detection//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 21329-21338 [DOI: 10.1109/CVPR52688.2022.02067http://dx.doi.org/10.1109/CVPR52688.2022.02067]
Yan Y, Mao Y and Li B. 2018. SECOND: sparsely embedded convolutional detection. Sensors, 18(10): #3337 [DOI: 10.3390/s18103337http://dx.doi.org/10.3390/s18103337]
Zhang S F, Chi C, Yao Y Q, Lei Z and Li S Z. 2020. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 9759-9768 [DOI: 10.1109/CVPR42600.2020.00978http://dx.doi.org/10.1109/CVPR42600.2020.00978]
Zhou Y and Tuzel O. 2018. VoxelNet: end-to-end learning for point cloud based 3D object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4490-4499 [DOI: 10.1109/CVPR.2018.00472http://dx.doi.org/10.1109/CVPR.2018.00472]
相关作者
相关机构