结合双边交叉增强与自注意力补偿的点云语义分割

朱仲杰; 张荣; 白永强; 王玉儿; 孙嘉敏

doi:10.11834/jig.230430

图像理解和计算机视觉 | 浏览量 : 0 下载量: 306 CSCD: 0

PDF
导出
分享
收藏
专辑

结合双边交叉增强与自注意力补偿的点云语义分割
Bilateral cross enhancement with self-attention compensation for semantic segmentation of point clouds
2024年29卷第8期页码：2388-2398
收稿日期：2023-07-11，

修回日期：2023-11-09，

纸质出版日期：2024-08-16
DOI： 10.11834/jig.230430
稿件说明：

移动端阅览

朱仲杰，张荣，白永强，王玉儿，孙嘉敏. 2024. 结合双边交叉增强与自注意力补偿的点云语义分割. 中国图象图形学报， 29(08):2388-2398 DOI： 10.11834/jig.230430.

Zhu Zhongjie， Zhang Rong， Bai Yongqiang， Wang Yuer， Sun Jiamin. 2024. Bilateral cross enhancement with self-attention compensation for semantic segmentation of point clouds. Journal of Image and Graphics， 29(08):2388-2398 DOI： 10.11834/jig.230430.

摘要

目的

针对现有点云语义分割方法对几何与语义特征信息利用不充分，导致分割性能不佳，特别是局部细粒度分割精度不足的问题，提出一种结合双边交叉增强与自注意力补偿的充分融合几何与语义上下文信息的点云语义分割新算法以提升分割性能。

方法

首先，设计基于双边交叉增强的空间聚合模块，将局部几何与语义上下文信息映射到同一空间进行交叉学习增强后聚合为局部上下文信息。然后，基于自注意力机制提取全局上下文信息与增强后的局部上下文信息进行融合，补偿局部上下文信息的单一性，得到完备特征图。最后，将空间聚合模块各阶段输出的多分辨率特征输入特征融合模块进行多尺度特征融合，得到最终的综合特征图以实现高性能语义分割。

结果

实验结果表明，在S3DIS（Stanford 3D indoor spaces dataset）数据集上，本文算法的平均交并比（mean intersection over union，mIoU）、平均类别精度（mean class accuracy，mAcc）和总体精度（overall accuracy，OA）分别为70.2%、81.7%和88.3%，与现有优秀算法RandLA-Net相比，分别提高2.4%、2.0%和1.0%。同时，对S3DIS数据集Area 5单独测试，本文算法的mIoU为66.2%，较RandLA-Net提高5.0%。

结论

空间聚合模块不仅能够充分利用局部几何与语义上下文信息增强局部上下文信息，而且基于自注意力机制融合局部与全局上下文信息，增强了特征的完备性以及局部与全局的关联性，可以有效提升点云局部细粒度的分割精度。在可视化分析中，相较于对比算法，本文算法对点云场景的局部细粒度分割效果明显提升，验证了本文算法的有效性。

Abstract

Objective

Point cloud semantic segmentation is a computer vision task that aims to segment 3D point cloud data and assign corresponding semantic labels to each point. Specifically， according to the location and other attributes of the point， point cloud semantic segmentation involves assigning each point to predefined semantic categories， such as ground， buildings， vehicles， and pedestrians. Existing methods for point cloud semantic segmentation can be broadly categorized into three types： projection-， voxel-， and raw point cloud-based methods. Projection-based methods project the 3D point cloud onto a 2D plane （e.g.， an image） and then apply standard image-based segmentation techniques. Voxel-based methods divide the point cloud space into regular voxel grids and assign semantic labels to each voxel. Both methods require data transformation， which inevitably leads to some loss of feature information. By contrast， raw point cloud-based methods directly process the point cloud without any transformation， which ensures the integrity of the input algorithm network with the original point cloud data. The geometric and semantic feature information of each point in the point cloud scene needs to be fully considered and utilized to achieve accurate semantic segmentation tasks. Existing methods for point cloud semantic segmentation generally extract， process， and utilize geometric and semantic feature information separately， without considering their correlation. This approach leads to less precise local fine-grained segmentation. Therefore， this study proposes a new algorithm for point cloud semantic segmentation based on bilateral cross-enhancement and self-attention compensation. It not only fully utilizes the geometric and semantic feature information of the point cloud but also constructs offsets between them as a medium for information interaction. In addition， the fusion of local and global feature information is achieved， which enhances feature completeness and overall segmentation performance. This fusion process enhances the integrity of features and ensures the full representation and utilization of local and global contexts during the segmentation process. By considering the overall information of the point cloud scene， this algorithm demonstrates better performance in segmenting local fine-grained details and larger-scale structures.

Method

First， the original input point cloud data are preprocessed to extract geometric contextual information and initial semantic contextual information. The geometric contextual information is represented by the original coordinates of the point cloud in 3D space， while the initial semantic contextual information is extracted using a multilayer perceptron. Next， a spatial aggregation module is designed， which consists of bilateral cross-enhancement and self-attention mechanism units. In the bilateral cross-enhancement units， local geometric and semantic contextual information is preliminarily extracted by constructing local neighborhoods for the preprocessed geometric contextual information and initial semantic contextual information. Then， offsets are constructed to facilitate cross-learning and enhancement of the local geometric and semantic contextual information by mapping it onto a common space. Finally， the enhanced local geometric and semantic contextual information is aggregated to local contextual information. Next， using the self-attention mechanism， global contextual information is extracted and fused with the local contextual information to compensate for the singularity of the local contextual information， which results in a comprehensive feature map. Finally， the multi-resolution feature maps obtained at different stages of the spatial aggregation module are fed into the feature fusion module for multi-scale feature fusion， which produces the final comprehensive feature map. Thus， high-performance semantic segmentation is achieved.

Result

Experimental results on the Stanford 3D indoor spaces dataset（S3DIS） show a mean intersection over union （mIoU） of 70.2%， a mean class accuracy of （mAcc） 81.7%， and an overall accuracy （OA） of 88.3%， which are 2.4%， 2.0%， and 1.0% higher than those of the existing representative algorithm RandLA-Net. Meanwhile， for Area 5 of the S3DIS， the mIoU is 66.2%， which is 5.0% higher than that of RandLA-Net. In addition， visualizations of the segmentation results are achieved on the Semantic3D dataset.

Conclusion

By utilizing the spatial aggregation module， the proposed algorithm maximizes the utilization of geometric and semantic contextual information， which enhances the details of local contextual information. In addition， the integration of local and global contextual information through self-attention mechanism ensures comprehensive feature representation. As a result， the proposed algorithm achieves a significant improvement in the segmentation accuracy of fine-grained details in point clouds. Visual analysis further validates the effectiveness of the algorithm. Compared with baseline algorithms， the proposed algorithm demonstrates clear superiority in the fine-grained segmentation of local regions in point cloud scenes. This result serves as partial evidence， which confirms the effectiveness of the proposed algorithm in addressing challenges related to point cloud segmentation tasks. In conclusion， the spatial aggregation module and its fusion of local and global contextual information significantly improve the segmentation accuracy of local details in point clouds. This approach offers a promising solution to enhance the segmentation accuracy of fine-grained details in point cloud local regions.

关键词

Keywords

references

Alonso I ， Riazuelo L ， Montesano L and Murillo A C . 2020 . 3D-mininet： learning a 2D representation from point clouds for fast and efficient 3D LIDAR semantic segmentation . IEEE Robotics and Automation Letters ， 5 （ 4 ）： 5432 - 5439 ［ DOI： 10.1109/LRA.2020.3007440 http://dx.doi.org/10.1109/LRA.2020.3007440 ］

Armeni I ， Sax S ， Zamir A R and Savarese S . 2017 . Joint 2D-3D-semantic data for indoor scene understanding ［EB/OL］. ［ 2023-06-25 ］. https://arxiv.org/pdf/1702.01105.pdf https://arxiv.org/pdf/1702.01105.pdf

Bi Y W ， Zhang L J ， Liu Y W ， Huang Y S and Liu H . 2023 . A local-global feature fusing method for point clouds semantic segmentation . IEEE Access ， 11 ： 68776 - 68790 ［ DOI： 10.1109/ACCESS.2023.3293161 http://dx.doi.org/10.1109/ACCESS.2023.3293161 ］

Boulch A ， Guerry J ， Le Saux B and Audebert N . 2018 . SnapNet： 3D point cloud semantic labeling with 2D deep segmentation networks . Computers and Graphics ， 71 ： 189 - 198 ［ DOI： 10.1016/j.cag.2017.11.010 http://dx.doi.org/10.1016/j.cag.2017.11.010 ］

Chen C ， Wang Y S ， Chen H H ， Yan X F ， Ren D Y ， Guo Y W ， Xie H R ， Wang F L and Wei M Q . 2023 . GeoSegNet： point cloud semantic segmentation via geometric encoder-decoder modeling . The Visual Computer ， 40 ： 5107 - 5121 ［ DOI： 10.1007/s00371-023-02853-7 http://dx.doi.org/10.1007/s00371-023-02853-7 ］

Chen J J ， Kakillioglu B and Velipasalar S . 2022 . Background-aware 3-D point cloud segmentation with dynamic point feature aggregation . IEEE Transactions on Geoscience and Remote Sensing ， 60 ： # 5703112 ［ DOI： 10.1109/TGRS.2022.3168555 http://dx.doi.org/10.1109/TGRS.2022.3168555 ］

Deng S and Dong Q L . 2021 . GA-NET： global attention network for point cloud semantic segmentation . IEEE Signal Processing Letters ， 28 ： 1300 - 1304 ［ DOI： 10.1109/LSP.2021.3082851 http://dx.doi.org/10.1109/LSP.2021.3082851 ］

Du J and Cai G R . 2021 . Point cloud semantic segmentation method based on multi-feature fusion and residual optimization . Journal of Image and Graphics ， 26 （ 5 ）： 1105 - 1116

杜静，蔡国榕 . 2021 . 多特征融合与残差优化的点云语义分割方法 . 中国图象图形学报， 26 （ 5 ）： 1105 - 1116 ［ DOI： 10.11834/jig.200374 http://dx.doi.org/10.11834/jig.200374 ］

Giang T T H and Ryoo Y J . 2023 . Pruning points detection of sweet pepper plants using 3D point clouds and semantic segmentation neural network . Sensors ， 23 （ 8 ）： # 4040 ［ DOI： 10.3390/s23084040 http://dx.doi.org/10.3390/s23084040 ］

Hackel T ， Savinov N ， Ladicky L ， Wegner J D ， Schindler K and Pollefeys M . 2017 . Semantic 3 D. net： a new large-scale point cloud classification benchmark ［EB/OL］. ［ 2023-06-25 ］. https://arxiv.org/pdf/1704.03847.pdf https://arxiv.org/pdf/1704.03847.pdf

Hu Q Y ， Yang B ， Xie L H ， Rosa S ， Guo Y L ， Wang Z H ， Trigoni N and Markham A . 2022 . Learning semantic segmentation of large-scale point clouds with random sampling . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 44 （ 11 ）： 8338 - 8354 ［ DOI： 10.1109/TPAMI.2021.3083288 http://dx.doi.org/10.1109/TPAMI.2021.3083288 ］

Huang Q G ， Wang W Y and Neumann U . 2018 . Recurrent slice networks for 3D segmentation of point clouds // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 2626 - 2635 ［ DOI： 10.1109/CVPR.2018.00278 http://dx.doi.org/10.1109/CVPR.2018.00278 ］

Landrieu L and Simonovsky M . 2018 . Large-scale point cloud semantic segmentation with superpoint graphs // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 4558 - 4567 ［ DOI： 10.1109/CVPR.2018.00479 http://dx.doi.org/10.1109/CVPR.2018.00479 ］

Li D W ， Shi G L ， Wu Y H ， Yang Y P and Zhao M B . 2021 . Multi-scale neighborhood feature extraction and aggregation for point cloud segmentation . IEEE Transactions on Circuits Systems for Video Technology ， 31 （ 6 ）： 2175 - 2191 ［ DOI： 10.1109/TCSVT.2020.3023051 http://dx.doi.org/10.1109/TCSVT.2020.3023051 ］

Li Y Y ， Bu R ， Sun M C ， Wu W ， Di X H and Chen B Q . 2018 . PointCNN： convolution on X -transformed points // Proceedings of the 32nd Conference on Neural Information Processing Systems . Montréal， Canada ： Curran Associates Inc.： 828 - 838 ［ DOI： 10.5555/3326943.3327020 http://dx.doi.org/10.5555/3326943.3327020 ］

Liu S ， Cao Y F ， Huang W H and Li D D . 2023 . LiDAR point cloud semantic segmentation combined with sparse attention and instance enhancement . Journal of Image and Graphics ， 28 （ 2 ）： 483 - 494

刘盛，曹益烽，黄文豪，李丁达 . 2023 . 融合稀疏注意力和实例增强的雷达点云分割 . 中国图象图形学报， 28 （ 2 ）： 483 - 494 ［ DOI： 10.11834/jig.210787 http://dx.doi.org/10.11834/jig.210787 ］

Luo H F ， Chen C C ， Fang L N ， Khoshelham K and Shen G X . 2020 . MS-RRFSegNet： multiscale regional relation feature segmentation network for semantic segmentation of urban scene point clouds . IEEE Transactions on Geoscience and Remote Sensing ， 58 （ 12 ）： 8301 - 8315 ［ DOI： 10.1109/TGRS.2020.2985695 http://dx.doi.org/10.1109/TGRS.2020.2985695 ］

Poux F and Billen R . 2019 . Voxel-based 3D point cloud semantic segmentation： unsupervised geometric and relationship featuring vs deep learning methods . ISPRS International Journal of Geo-Information ， 8 （ 5 ）： # 213 ［ DOI： 10.3390/IJGI8050213 http://dx.doi.org/10.3390/IJGI8050213 ］

Qi C R ， Su H ， Mo K C and Guibas L J . 2017a . PointNet： deep learning on point sets for 3D classification and segmentation // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 652 - 660 ［ DOI： 10.1109/CVPR.2017.16 http://dx.doi.org/10.1109/CVPR.2017.16 ］

Qi C R ， Yi L ， Su H and Guibas L J . 2017b . PointNet++： deep hierarchical feature learning on point sets in a metric space // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach， USA ： Curran Associates Inc.： 5105 - 5114 ［ DOI： 10.5555/3295222.3295263 http://dx.doi.org/10.5555/3295222.3295263 ］

Qiu S ， Anwar S and Barnes N . 2021 . Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： 1757 - 1767 ［ DOI： 10.1109/CVPR46437.2021.00180 http://dx.doi.org/10.1109/CVPR46437.2021.00180 ］

Ren D Y ， Wu Z Y ， Li J W ， Yu P P ， Guo J ， Wei M Q and Guo Y W . 2022 . Point attention network for point cloud semantic segmentation . Science China Information Sciences ， 65 （ 9 ）： # 192104 ［ DOI： 10.1007/s11432-021-3387-7 http://dx.doi.org/10.1007/s11432-021-3387-7 ］

Wang G H ， Zhai Q Y and Liu H . 2022 . Cross self-attention network for 3D point cloud . Knowledge-Based Systems ， 247 ： # 108769 ［ DOI： 10.1016/j.knosys.2022.108769 http://dx.doi.org/10.1016/j.knosys.2022.108769 ］

Wang L ， Huang Y C ， Hou Y L ， Zhang S M and Shan J . 2019 . Graph attention convolution for point cloud semantic segmentation // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 10296 - 10305 ［ DOI： 10.1109/CVPR.2019.01054 http://dx.doi.org/10.1109/CVPR.2019.01054 ］

Wang S F ， Liu Y ， Wang L C ， Sun Y F and Yin B C . 2023 . PASIFTNet： scale-and-directional-aware semantic segmentation of point clouds . Computer-Aided Design ， 156 ： # 103462 ［ DOI： 10.1016/j.cad.2022.103462 http://dx.doi.org/10.1016/j.cad.2022.103462 ］

Yan X ， Zheng C D ， Li Z ， Wang S and Cui S G . 2020 . PointASNL： robust point clouds processing using nonlocal neural networks with adaptive sampling // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 5588 - 5597 ［ DOI： 10.1109/CVPR42600.2020.00563 http://dx.doi.org/10.1109/CVPR42600.2020.00563 ］

Ye X Q ， Li J M ， Huang H X ， Du L and Zhang X L . 2018 . 3D recurrent neural networks with context fusion for point cloud semantic segmentation // Proceedings of the 15th European Conference on Computer Vision-ECCV 2018 . Munich， Germany ： Springer： 415 - 430 ［ DOI： 10.1007/978-3-030-01234-2_25 http://dx.doi.org/10.1007/978-3-030-01234-2_25 ］

Zhao H S ， Jiang L ， Fu C W and Jia J Y . 2019 . PointWeb： enhancing local neighborhood features for point cloud processing // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 5565 - 5573 ［ DOI： 10.1109/CVPR.2019.00571 http://dx.doi.org/10.1109/CVPR.2019.00571 ］

Zhao Y Q ， Ma X Y ， Hu B ， Zhang Q ， Ye M and Zhou G Q . 2023 . A large-scale point cloud semantic segmentation network via local dual features and global correlations . Computers and Graphics ， 111 ： 133 - 144 ［ DOI： 10.1016/j.cag.2023.01.011 http://dx.doi.org/10.1016/j.cag.2023.01.011 ］

Zheng Y ， Lin C Y ， Liao K ， Zhao Y and Xue S . 2021 . LiDAR point cloud segmentation through scene viewpoint offset . Journal of Image and Graphics ， 26 （ 10 ）： 2514 - 2523

郑阳，林春雨，廖康，赵耀，薛松 . 2021 . 场景视点偏移的激光雷达点云分割 . 中国图象图形学报， 26 （ 10 ）： 2514 - 2523 ［ DOI： 10.11834/jig.200424 http://dx.doi.org/10.11834/jig.200424 ］

Zhong M Y ， Li C J ， Liu L C ， Wen J H ， Ma J W and Yu X H . 2020 . Fuzzy neighborhood learning for deep 3-D segmentation of point cloud . IEEE Transactions on Fuzzy Systems ， 28 （ 12 ）： 3181 - 3192 ［ DOI： 10.1109/TFUZZ.2020.2992611 http://dx.doi.org/10.1109/TFUZZ.2020.2992611 ］

文章被引用时，请邮件提醒。

提交

基于多层级并行神经网络的多模态脑肿瘤图像分割框架