结合空间结构卷积和注意力机制的三维点云分类网络

武斌; 刘溢安; 赵洁

doi:10.11834/jig.230137

图像理解和计算机视觉 | 浏览量 : 0 下载量: 272 CSCD: 2

PDF
导出
分享
收藏
专辑

结合空间结构卷积和注意力机制的三维点云分类网络
Classification network for 3D point cloud based on spatial structure convolution and attention mechanism
2024年29卷第2期页码：520-532
收稿日期：2023-03-24，

修回日期：2023-06-25，

纸质出版日期：2024-02-16
DOI： 10.11834/jig.230137
稿件说明：

移动端阅览

武斌，刘溢安，赵洁. 2024. 结合空间结构卷积和注意力机制的三维点云分类网络. 中国图象图形学报， 29(02):0520-0532 DOI： 10.11834/jig.230137.

Wu Bin， Liu Yian， Zhao Jie. 2024. Classification network for 3D point cloud based on spatial structure convolution and attention mechanism. Journal of Image and Graphics， 29(02):0520-0532 DOI： 10.11834/jig.230137.

摘要

目的

三维点云分类作为一项关键任务，在计算机视觉、机器人和自动驾驶等领域有着广泛的应用场景。现有的三维点云分类网络在使用边卷积进行局部特征提取时通常存在输入特征差异性小，空间结构信息提取、融合不充分等问题。针对上述问题，设计了一种结合空间结构卷积和注意力机制的点云分类网络。

方法

首先，提出一种空间结构卷积，在边卷积的基础上引入邻接点之间的相对位置信息来降低输入特征相似性，而后从结构和位置两个角度分别进行特征编码，实现更具多样性的局部几何结构捕获。其次，设计了全局特征编码模块，从坐标信息中提炼全局特征信息，同时在网络中融合了注意力机制，用于关联局部和全局特征表示，有效保留了全局特征信息，实现全局特征的适应性调整。最后，将局部几何结构信息和全局位置信息进行有效的融合，获得更具代表性和差异性的特征表征。

结果

设计实验在公开数据集ModelNet40上对提出的网络模型的性能进行评估，点云分类总体准确率和平均准确率分别达到93.0%和89.7%，具备良好的分类性能和预测效率。实验结果表明，空间结构卷积的使用有效增加了输入特征的多样性，位置和结构的单独编码有效提高了局部特征的表达能力。同时，提出的注意力加权方式在保留全局特征前提下实现了局部特征和全局特征的关联。

结论

提出的网络有较强的细粒度特征提取能力，具有良好的分类性能。

Abstract

Objective

3D point cloud classification is a crucial task with diverse applications in computer vision， robotics， and autonomous driving. The advancement of computing device performance in recent years has enabled researchers to apply deep learning methods to the field of 3D point cloud recognition. Deep learning-based methods that are currently in use for 3D point cloud classification typically divide the feature information captured by a network into two distinct parts： global and local features. Global features refer to the overall shape and structure of the point cloud， while local features capture more detailed information about individual points. By leveraging global and local features， these methods can achieve high accuracy in point cloud classification tasks. Edge convolution （EdgeConv） is currently the most widely used method for local feature extraction in 3D point cloud classification. This method incorporates relative position vectors into feature encoding to capture the characteristics of local structures effectively. However， when local structures in 3D point clouds are similar， the use of relative positions in feature encoding may result in similar features， leading to poor classification results. Furthermore， encoding only local features may be insufficient for achieving optimal classification results， because considering the correlation between local and global features is also crucial. Current methods frequently employ attention mechanisms to learn attention scores from global features and weigh local features accordingly， effectively establishing the correlation between local and global features. However， these methods may not fully consider the importance of global feature information and may suffer from suboptimal classification results.

Method

To address the aforementioned challenges， this study proposes a novel 3D point cloud classification network that leverages spatial structure convolution （SSConv） and attention mechanisms. The proposed network architecture consists of two parts： a local feature encoding （LFE） module and a global feature encoding （GFE） module. The former uses SSConv to encode local features from location and structure， while the latter learns global feature representation from raw coordinate data. Furthermore， to enable effective correlation and complementarity between feature information， we introduce an attention mechanism that facilitates adaptive adjustment of global features through weighted operations. The LFE module is composed of two operations： graph construction and feature extraction. The LFE module performs the K-nearest neighbor （KNN） algorithm to identify adjacent points and construct a graph structure. SSConv is a crucial feature extraction operation that involves a multilayer perceptron. Compared with EdgeConv， SSConv introduces a relative position vector between adjacent points. This operation effectively increases the correlation distance between raw input data， enriches local region structure information， and enhances the spatial expression ability of the extracted high-level semantic information. To capture more effective local structure features， feature extraction is encoded separately on the basis of structure and location. In particular， the location encoding branch encodes the coordinate information separately to obtain richer location feature information for describing the spatial location of each point. Meanwhile， the structure encoding branch encodes the relative location vector separately to learn the structure information in the local region for describing the overall geometric structure of the local neighborhood. The global feature encoding module maps raw coordinate data to high-dimensional features， which are used as the global feature representation of the point cloud. In addition， the module includes an attention mechanism to enhance the correlation between local and global features. In particular， an attention weighting method is used to guide the learning of global feature information by using local feature information. This operation enables correlation and fusion between local and global feature representations while preserving raw feature information.

Result

To evaluate the performance of the proposed network model， experimental validation is conducted on the publicly available ModelNet40 dataset， which consists of 9 843 training models and 2 468 testing models in 40 classes. Classification performance was evaluated using metrics， such as overall accuracy （OA） and mean accuracy （mAcc）， in the experiments. To evaluate classification performance， the proposed model was evaluated against four pointwise methods， two convolution-based methods， two graph convolution-based methods， and four attention mechanism-based methods. The experimental results demonstrate that the proposed network exhibits good performance in the point cloud classification task and is capable of effectively representing local and global features. The proposed method achieves an OA of 93.0%， outperforming dynamic graph convolutional neural network（DGCNN） by 0.1%， PointWeb by 0.7%， and PointCNN by 0.8%. In addition， the mAcc of the proposed method reaches 89.7%. Furthermore， an experiment was designed to validate the efficacy of SSConv. By replacing SSConv with EdgeConv in the network architecture， the experimental results indicate a reduction in OA of 0.5% on the ModelNet40 dataset， demonstrating that SSConv is better suited for local representation than EdgeConv. Meanwhile， an experiment was designed to verify the diversity of input features of SSConv. The correlation of features was evaluated using Euclidean， cosine， and correlation distances. The results indicate that SSConv enhances diversity among input features more effectively than EdgeConv. Furthermore， the visualization results of the intermediate layer features in the model demonstrate that SSConv can learn more distinctive features.

Conclusion

The proposed network model achieves better classification results， with an OA of 93.0% and an mAcc of 89.7%， surpassing those of existing methods. The proposed spatially structured convolution effectively enhances the variability of input features， allowing the model to learn more diverse local feature representations of objects. The proposed global feature coding method based on the attention mechanism effectively adjusts features and fully extracts the relationship between local and global feature information while preserving global features. To summarize， the proposed network model exhibits good capability for fine-grained feature extraction and achieves good classification performance.

关键词

Keywords

references

Chen C ， Fragonara L Z and Tsourdos A . 2021 . GAPointNet： graph attention based point neural network for exploiting local feature of point cloud . Neurocomputing ， 438 ： 122 - 132 ［ DOI： 10.1016/j.neucom.2021.01.095 http://dx.doi.org/10.1016/j.neucom.2021.01.095 ］

Chen H J ， Da F P and Gai S Y . 2021 . Deep 3D point cloud classification network based on competitive attention fusion . Journal of Zhejiang University （Engineering Science）， 55 （ 12 ）： 2342 - 2351

陈涵娟，达飞鹏，盖绍彦 . 2021 . 基于竞争注意力融合的深度三维点云分类网络 . 浙江大学学报（工学版）， 55 （ 12 ）： 2342 - 2351 ［ DOI： 10.3785/j.issn.1008-973X.2021.12.014 http://dx.doi.org/10.3785/j.issn.1008-973X.2021.12.014 ］

Deng L T and Fang Z J . 2022 . Point cloud analysis method based on feature negative feedback convolution . Laser and Optoelectronics Progress ， 59 （ 12 ）： #1210006

邓林涛，方志军 . 2022 . 基于特征负反馈卷积的点云分析方法 . 激光与光电子学进展， 59 （ 12 ）： # 1210006 ［ DOI： 10.3788/LOP202259.1210006 http://dx.doi.org/10.3788/LOP202259.1210006 ］

Engel N ， Belagiannis V and Dietmayer K . 2021 . Point Transformer . IEEE Access ， 9 ： 134826 - 134840 ［ DOI： 10.1109/ACCESS.2021.3116304 http://dx.doi.org/10.1109/ACCESS.2021.3116304 ］

Guo M H ， Cai J X ， Liu Z N ， Mu T J ， Martin R R and Hu S M . 2021a . PCT： point cloud Transformer . Computational Visual Media ， 7 （ 2 ）： 187 - 199 ［ DOI： 10.1007/s41095-021-0229-5 http://dx.doi.org/10.1007/s41095-021-0229-5 ］

Guo Y L ， Wang H Y ， Hu Q Y ， Liu H ， Liu L and Bennamoun M . 2021b . Deep learning for 3D point clouds： a survey . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 43 （ 12 ）： 4338 - 4364 ［ DOI： 10.1109/TPAMI.2020.3005434 http://dx.doi.org/10.1109/TPAMI.2020.3005434 ］

Lei H ， Akhtar N and Mian A . 2019 . Octree guided CNN with spherical kernels for 3D point clouds // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 9623 - 9632 ［ DOI： 10.1109/CVPR.2019.00986 http://dx.doi.org/10.1109/CVPR.2019.00986 ］

Li Y Y ， Bu R ， Sun M C ， Wu W ， Di X H and Chen B Q . 2018 . PointCNN： convolution on Χ -transformed points // Proceedings of the 32nd International Conference on Neural Information Processing Systems . Montréal， Canada ： Curran Associates Inc.： 828 - 838

Lu D N ， Xie Q ， Xu L L and Li J . 2022 . 3 DCTN： 3D convolution-Transformer network for point cloud classification ［EB/OL］. ［ 2023-02-21 ］. http://arxiv.org/pdf/2203.00828.pdf http://arxiv.org/pdf/2203.00828.pdf

Maturana D and Scherer S . 2015 . VoxNet： a 3D convolutional neural network for real-time object recognition // Proceedings of 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems . Hamburg， Germany ： IEEE： 922 - 928 ［ DOI： 10.1109/IROS.2015.7353481 http://dx.doi.org/10.1109/IROS.2015.7353481 ］

Qi C R ， Su H ， Mo K C and Guibas L J . 2017a . PointNet： deep learning on point sets for 3D classification and segmentation // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 77 - 85 ［ DOI： 10.1109/CVPR.2017.16 http://dx.doi.org/10.1109/CVPR.2017.16 ］

Qi C R ， Yi L ， Su H and Guibas L J . 2017b . PointNet++： deep hierarchical feature learning on point sets in a metric space // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach， USA ： Curran Associates Inc.： 5105 - 5114

Qiu S ， Anwar S and Barnes N . 2022 . Geometric back-projection network for point cloud classification . IEEE Transactions on Multimedia ， 24 ： 1943 - 1955 ［ DOI： 10.1109/TMM.2021.3074240 http://dx.doi.org/10.1109/TMM.2021.3074240 ］

Riegler G ， Ulusoy A O and Geiger A . 2017 . OctNet： learning deep 3D representations at high resolutions // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 6620 - 6629 ［ DOI： 10.1109/CVPR.2017.701 http://dx.doi.org/10.1109/CVPR.2017.701 ］

Song W ， Cai W Y ， He S Q and Li W J . 2021 . Dynamic graph convolution with spatial attention for point cloud classification and segmentation . Journal of Image and Graphics ， 26 （ 11 ）： 2691 - 2702

宋巍，蔡万源，何盛琪，李文俊 . 2021 . 结合动态图卷积和空间注意力的点云分类与分割 . 中国图象图形学报， 26 （ 11 ）： 2691 - 2702 ［ DOI： 10.11834/jig.200550 http://dx.doi.org/10.11834/jig.200550 ］

Su H ， Maji S ， Kalogerakis E and Learned-Miller E . 2015 . Multi-view convolutional neural networks for 3D shape recognition // Proceedings of 2015 IEEE International Conference on Computer Vision . Santiago， Chile ： IEEE： 945 - 953 ［ DOI： 10.1109/ICCV.2015.114 http://dx.doi.org/10.1109/ICCV.2015.114 ］

Thomas H ， Qi C R ， Deschaud J E ， Marcotegui B ， Goulette F and Guibas L . 2019 . KPConv： flexible and deformable convolution for point clouds // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 6410 - 6419 ［ DOI： 10.1109/ICCV.2019.00651 http://dx.doi.org/10.1109/ICCV.2019.00651 ］

Vaswani A ， Shazeer N ， Parmar N ， Uszkoreit J ， Jones L ， Gomez A N ， Kaiser Ł and Polosukhin I . 2017 . Attention is all you need // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach， USA ： Curran Associates Inc.： 6000 - 6010

Wang Y ， Sun Y B ， Liu Z W ， Sarma S E ， Bronstein M M and Solomon J M . 2019 . Dynamic graph CNN for learning on point clouds . ACM Transactions on Graphics ， 38 （ 5 ）： # 146 ［ DOI： 10.1145/3326362 http://dx.doi.org/10.1145/3326362 ］

Wei X ， Yu R X and Sun J . 2020 . View-GCN： view-based graph convolutional network for 3D shape analysis // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 1847 - 1856 ［ DOI： 10.1109/CVPR42600.2020.00192 http://dx.doi.org/10.1109/CVPR42600.2020.00192 ］

Wu Z R ， Song S R ， Khosla A ， Yu F ， Zhang L G ， Tang X O and Xiao J X . 2015 . 3D ShapeNets： a deep representation for volumetric shapes // Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition . Boston， USA ： IEEE： 1912 - 1920 ［ DOI： 10.1109/CVPR.2015.7298801 http://dx.doi.org/10.1109/CVPR.2015.7298801 ］

Xiang X Y ， Li G Y ， Wang L ， Zong W P ， Lyu Z P and Xiang F Z . 2023 . Semantic segmentation of point clouds using local geometric features and dilated neighborhoods . Geomatics and Information Science of Wuhan University ， 48 （ 4 ）： 534 - 541

项学泳，李广云，王力，宗文鹏，吕志鹏，向奉卓 . 2023 . 利用局部几何特征与空洞邻域的点云语义分割 . 武汉大学学报（信息科学版）， 48 （ 4 ）： 534 - 541 ［ DOI： 10.13203/j.whugis20200567 http://dx.doi.org/10.13203/j.whugis20200567 ］

Yan X ， Zheng C D ， Li Z ， Wang S and Cui S G . 2020 . PointASNL： robust point clouds processing using nonlocal neural networks with adaptive sampling // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 5588 - 5597 ［ DOI： 10.1109/CVPR42600.2020.00563 http://dx.doi.org/10.1109/CVPR42600.2020.00563 ］

Yang W B ， Sheng S Q ， Luo X F and Xie S R . 2022 . Geometric relation based point clouds classification and segmentation . Concurrency and Computation ： Practice and Experience ， 34 （ 11 ）： # 6845 ［ DOI： 10.1002/cpe.6845 http://dx.doi.org/10.1002/cpe.6845 ］

Yang Z and Wang L W . 2019 . Learning relationships for multi-view 3D object recognition // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 7504 - 7513 ［ DOI： 10.1109/ICCV.2019.00760 http://dx.doi.org/10.1109/ICCV.2019.00760 ］

Yu T ， Meng J J and Yuan J S . 2018 . Multi-view harmonized bilinear network for 3D object recognition // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 186 - 194 ［ DOI： 10.1109/CVPR.2018.00027 http://dx.doi.org/10.1109/CVPR.2018.00027 ］

Zhang K G ， Hao M ， Wang J ， de Silva C W and Fu C L . 2019 . Linked dynamic graph CNN： learning on point cloud via linking hierarchical features ［EB/OL］. ［ 2023-02-21 ］. http://arxiv.org/pdf/1904.10014.pdf http://arxiv.org/pdf/1904.10014.pdf

Zhao H S ， Jiang L ， Fu C W and Jia J Y . 2019 . PointWeb： enhancing local neighborhood features for point cloud processing // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 5560 - 5568 ［ DOI： 10.1109/CVPR.2019.00571 http://dx.doi.org/10.1109/CVPR.2019.00571 ］

Zhong Q and Han X F . 2022 . Point cloud learning with Transformer ［EB/OL］. ［ 2023-02-21 ］. http://arxiv.org/pdf/2104.13636.pdf http://arxiv.org/pdf/2104.13636.pdf

文章被引用时，请邮件提醒。

提交

结合动态图卷积和空间注意力的点云分类与分割