嵌入Transformer结构的多尺度点云补全
Multi-scale Transformer based point cloud completion network
- 2022年27卷第2期 页码:538-549
纸质出版日期: 2022-02-16 ,
录用日期: 2021-11-02
DOI: 10.11834/jig.210510
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2022-02-16 ,
录用日期: 2021-11-02
移动端阅览
刘心溥, 马燕新, 许可, 万建伟, 郭裕兰. 嵌入Transformer结构的多尺度点云补全[J]. 中国图象图形学报, 2022,27(2):538-549.
Xinpu Liu, Yanxin Ma, Ke Xu, Jianwei Wan, Yulan Guo. Multi-scale Transformer based point cloud completion network[J]. Journal of Image and Graphics, 2022,27(2):538-549.
目的
2
当前点云补全的深度学习算法多采用自编码器结构,然而编码器端常用的多层感知器(multilayer perceptron,MLP)网络往往只聚焦于点云整体形状,很难对物体的细节特征进行有效提取,使点云残缺结构的补全效果不佳。因此需要一种准确的点云局部特征提取算法,用于点云补全任务。
方法
2
为解决该问题,本文提出了嵌入注意力模块的多尺度点云补全算法。网络整体采用编码器—解码器结构,通过编码器端的特征嵌入层和Transformer层提取并融合3种不同分辨率的残缺点云特征信息,将其输入到全连接网络的解码器中,输出逐级补全的缺失点云。最后在解码器端添加注意力鉴别器,借鉴生成对抗网络(generative adversarial networks,GAN)的思想,优化网络补全性能。
结果
2
采用倒角距离(Chamfer distance,CD)作为评价标准,本文算法在2个数据集上与相关的4种方法进行了实验比较,在ShapeNet数据集上,相比于性能第2的PF-Net(point fractal network)模型,本文算法的类别平均CD值降低了3.73%;在ModelNet10数据集上,相比于PF-Net模型,本文算法的类别平均CD值降低了12.75%。不同算法的可视化补全效果图,验证了本文算法具有更精准的细节结构补全能力和面对类别中特殊样本的强泛化能力。
结论
2
本文所提出的基于Transformer结构的多尺度点云补全算法,更好地提取了残缺点云的局部特征信息,使得点云补全的结果更加准确。
Objective
2
Three dimensional vision analysis is a key research aspect in computer vision research. Point cloud representation preserves the initial geometric information in 3D space under no discretization circumstances. Unfortunately
scanned 3D point clouds are incomplete due to occlusion
constrained sensor resolution and small viewing angle. Hence
a shape completion process is required for downstream 3D computer vision applications. Most deep learning based point cloud completion algorithms demonstrate an encoder-decoder structure and align multilayer perception (MLP) to extract point cloud features at the encoder. However
MLP networks tend to focus on the overall shape of the point cloud
and it is difficult to extract the local structural features of the object effectively. In addition
MLP does not generalize well to new objects
and it is difficult to complete the shape of objects with small training samples. So
it is a challenged issue that an efficient and accurate local structural feature extraction algorithm for point cloud completion.
Method
2
Multi-scale transformer based point cloud completion network (MSTCN) is illustrated. The entire network adopts an encoder decoder structure
which is composed of a multi-scale feature extractor
a pyramid point generator and a transformer based discriminator. The encoder of MSTCN extracts and aggregates the feature information of three types of incomplete point clouds with different resolutions through the transformer module
inputs them into a fully connected network based decoder
and then obtains the missing point clouds as outputs gradually. The feature embedding layer (FEL) and attention layer are melted into the encoder. The former improves the ability of the encoder to extract local structural features of point cloud via sampling and neighborhood grouping
the latter obtains the correlation information amongst points based on an improved self-attention module. As for decoder
pyramid point generator is mainly composed of a full connection layer and reshape operation. On the whole
a network adopts parallel operation on point clouds with three different resolutions
which are generated by the farthest down sampling approach. Similarly
point cloud completion is divided into three stages to achieve coarse-to-fine processing in the pyramid point generator. Based on generative adversarial network (GAN)
MSTCN adds a transformer based discriminator at the back end of the decoder
so that the discriminator and the generator can promote each other in joint training and optimize the completion performance of network. The loss function of MSTCN is mainly composed of two parts: generating loss and adversarial loss. Generating loss is the weighted sum of chamfer-distance(CD) between the generated point cloud and its ground-truth of three scales
and adversarial loss is the cross entropy sum of the generated point cloud and its ground-truth through the transformer-based discriminator.
Result
2
The experiment was compared with the latest methods on the ShapeNet and ModelNet10 datasets. On the ShapeNet dataset
this paper used all of the 16 categories for training
the average CD value of category calculated by MSTCN was reduced by 3.73% as compared to the second best model. Specifically
the CD values of cap
car
chair
earphone
lamp
pistol and table are better than those of point fractal network(PF-Net). On the ModelNet10 dataset
the average CD value of each category calculated by MSTCN was decreased by 12.75% as compared to the second best model. Specifically
the CD values of bathtub
chair
desk
dresser
monitor
night-stand
sofa
table and toilet are better than those of PF-Net. According to the visualization results based on six categories of aircraft
hat
chair
headset
motorcycle and table
MSTCN can accurately complete special structures and generalize to special samples in one category. The ablation studies were also taken on the ShapeNet dataset. As a result
the full MSTCN network performs better than three other networks which were MSTCNs with no feature embedding layer
attention layer and discriminator respectively. It illustrates that the feature embedding layer can make the model more capable to extract local structure information of point clouds
the attention layer can make the model selectively refer to the local structure of the input point cloud when completing. The discriminator can promote the completion effect of network. Meanwhile
three groups of point cloud completion sub models for different missing ratios were trained on ShapeNet dataset to verify the completion robustness of MSTCN model for input point clouds with different missing ratios. The category of chair and visualized the effect of completion are opted. As a result
the MSTCN model always maintains a good point cloud completion effect although the number of input point clouds decreases gradually
in which the completion results of 25% and 50% missing ratios have similar CD values. Even the missing ratio reaches 75%
CD value of the chair category remains at a low level of 2.074/2.456. The entire chair shape can be identified and completed only in accordance with the incomplete chair legs. This demonstration verifies that MSTCN has strong completion robustness while dealing with input point clouds with different missing ratios.
Conclusion
2
A multi-scale transformer based point cloud completion network (MSTCN) for point cloud completion has been illustrated. MSTCN can better extract local feature information of the residual point cloud
which makes the result of point cloud completion more accurate. The current point cloud completion algorithm has achieved good results in the completion of a single object. Future research can focus on the completion of large-scale scenes because the incomplete point cloud in scenes has a variety of incomplete types
such as view missing
spherical missing and occlusion missing. It is more challenging and practical to complete large scenes. On the other hand
the point clouds of real scanned scenes have no ground truth point cloud for reference.The unsupervised completion algorithms have its priority than supervised completion algorithms.
3维点云点云补全自编码器注意力机制生成对抗网络(GAN)
three-dimensional point cloudpoint cloud completionautoencoderattention mechanismgenerative adversarial networks(GAN)
Achlioptas P, Diamanti O, Mitliagkas I and Guibas L. 2018. Learning representations and generative models for 3D point clouds//Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR: 40-49
Chang A X, Funkhouser T, Guibas L, Hanrahan P, Huang Q X, Li Z M, Savarese S, Savva M, Song S R, Su H, Xiao J X, Li Y and Yu F. 2015. ShapeNet: an information-rich 3D model repository[EB/OL]. [2021-06-20].https://arxiv.org/pdf/1512.03012.pdfhttps://arxiv.org/pdf/1512.03012.pdf
Charles R Q, Li Y, Su H and Guibas L J. 2017b. PointNet ++: deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 5105-5114
Charles R Q, Su H, Kaichun M and Guibas L J. 2017a. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,USA: IEEE: 77-85[DOI: 10.1109/CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16]
Dai A, Qi C R and Nieβner M. 2017. Shape completion using 3D-encoder-predictor CNNs and shape synthesis//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2463-2471[DOI: 10.1109/CVPR.2017.693http://dx.doi.org/10.1109/CVPR.2017.693]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2021. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2021-06-05].https://arxiv.org/pdf/2010.11929v1.pdfhttps://arxiv.org/pdf/2010.11929v1.pdf
Fan H Q, Su H and Guibas L. 2017. A point set generation network for 3D object reconstruction from a single image//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2463-2471[DOI: 10.1109/CVPR.2017.264http://dx.doi.org/10.1109/CVPR.2017.264]
Goodfellow I J, Jean P A, Mirza M, Xu B, David W F, Ozair S, Courville A and Bengio Y. 2014. Generative Adversarial Nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: 2672-2680
Guo M H, Cai J X, Liu Z N, Mu T J, Martin R R and Hu S M. 2021a. PCT: point cloud transformer. Computational Visual Media, 7(2): 187-199[DOI: 10.1007/s41095-021-0229-5]
Guo Y L, WangH Y, Hu Q Y, Liu H, Liu L and Bennamoun M. 2021b. Deep learning for 3D point clouds: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12): 4338-4364[DOI: 10.1109/TPAMI.2020.3005434]
Han X G, Li Z, Huang H B, Kalogerakis E and Yu Y Z. 2017. High-resolution shape completion using deep neural networks for global structure and local geometry inference//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 85-93[DOI: 10.1109/ICCV.2017.19http://dx.doi.org/10.1109/ICCV.2017.19]
Huang Z T, Yu Y K, Xu J W, Ni F and Le X Y. 2020. PF-Net: point fractal network for 3D point cloud completion//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 7659-7667[DOI: 10.1109/CVPR42600.2020.00768http://dx.doi.org/10.1109/CVPR42600.2020.00768]
Kipf T N and Welling M. 2017. Semi-supervised classification with graph convolutional networks[EB/OL]. [2021-06-20].https://arxiv.org/pdf/1609.02907.pdfhttps://arxiv.org/pdf/1609.02907.pdf
Li Y Y, Dai A, Guibas L and Nieβner M. 2015. Database-assisted object retrieval for real-time 3D reconstruction. Computer Graphics Forum, 34(2): 435-446[DOI: 10.1111/cgf.12573]
Lin T Y, Dollár P, Girshick R, He K M, Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 936-944[DOI: 10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106]
Long X X, Cheng X J, Zhu H, Zhang P J, Liu H M, Li J, Zheng L T, Hu Q Y, Liu H, Cao X, Yang R G, Wu Y H, Zhang G F, Liu Y B, Xu K, Guo Y L and Chen B Q. 2021. Recent progress in 3D vision. Journal of Image and Graphics, 26(6): 1389-1428
龙霄潇, 程新景, 朱昊, 张朋举, 刘浩敏, 李俊, 郑林涛, 胡庆拥, 刘浩, 曹汛, 杨睿刚, 吴毅红, 章国锋, 刘烨斌, 徐凯, 郭裕兰, 陈宝权. 2021. 3维视觉前沿进展. 中国图象图形学报, 26(6): 1389-1428[DOI: 10.11834/jig.210043]
Mitra N J, Pauly M, Wand M and Ceylan D. 2013. Symmetry in 3D geometry: extraction and applications. Computer Graphics Forum, 32(6): 1-23[DOI: 10.1111/cgf.12010]
Nguyen D T, Hua B S, Tran M K, Pham Q H and Yeung S K. 2016. A field model for repairing 3D shapes//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 5676-5684[DOI: 10.1109/CVPR.2016.612http://dx.doi.org/10.1109/CVPR.2016.612]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin L. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: NIPS: 6000-6010
Wang Y, Sun Y B, Liu Z W, Sarma S E, Bronstein M M and Solomon J M. 2019. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics, 38(5): #146[DOI: 10.1145/3326362]
Wen X, Li T Y, Han Z Z and Liu Y S. 2020. Point cloud completion by skip-attention network with hierarchical folding//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1936-1945[DOI: 10.1109/CVPR42600.2020.00201http://dx.doi.org/10.1109/CVPR42600.2020.00201]
Wu Z R, Song S R, Khosla A, Khosla A, Yu F, Zhang L G, Tang X O and Xiao J X. 2015. 3D ShapeNets: a deep representation for volumetric shapes//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1912-1920[DOI: 10.1109/CVPR.2015.7298801http://dx.doi.org/10.1109/CVPR.2015.7298801]
Xie H Z, Yao H X, Zhou S C, Mao J G, Zhang S P and Sun W X. 2020. GRNet: gridding residual network for dense point cloud completion//Proceedings of the 16th European Conference on Computer Vision. Glasgow, Scotland: Springer: 365-381[DOI: 10.1007/978-3-030-58545-7_21http://dx.doi.org/10.1007/978-3-030-58545-7_21]
Yang Y Q, Chen F, Shen Y R and Tian D. 2018. FoldingNet: point cloud auto-encoder via deep grid deformation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 206-215[DOI: 10.1109/CVPR.2018.00029http://dx.doi.org/10.1109/CVPR.2018.00029]
Yuan W T, Khot T, Held D, Mertz C and Hebert M. 2018. PCN: point completion network//Proceedings of 2018 International Conference on 3D Vision (3DV). Verona, Italy: IEEE: 728-737[DOI: 10.1109/3DV.2018.00088http://dx.doi.org/10.1109/3DV.2018.00088]
Zhao H S, Jiang L, Jia J Y, Torr P and Koltun V. 2021. Point transformer[EB/OL]. [2021-06-20].https://arxiv.org/pdf/2012.09164.pdfhttps://arxiv.org/pdf/2012.09164.pdf
Zhao W, Gao S M and Lin H W. 2007. A robust hole-filling algorithm for triangular mesh. The Visual Computer, 23(12): 987-997[DOI:10.1007/s00371-007-0167-y]
相关作者
相关机构