触觉增强的图卷积点云超分网络
Tactile-enhanced graph convolutional point cloud super-resolution network
- 2024年 页码:1-13
网络出版日期: 2024-09-10
DOI: 10.11834/jig.230662
移动端阅览
浏览全部资源
扫码关注微信
网络出版日期: 2024-09-10 ,
移动端阅览
张驰,李剑,王普正等.触觉增强的图卷积点云超分网络[J].中国图象图形学报,
Zhang Chi,Li Jian,Wang Puzheng,et al.Tactile-enhanced graph convolutional point cloud super-resolution network[J].Journal of Image and Graphics,
目的
2
随着三维扫描仪以及三维点云采集技术的飞速发展,三维点云在计算机视觉、机器人导引、工业设计等方面的应用越来越广泛。但是由于传感器分辨率、扫描时间、扫描条件等限制,采集到的点云往往比较稀疏,无法满足许多应用任务的要求,因此人们一般采用上采样的方法来获取稠密点云。但是由于原始稀疏点云缺失细节信息,对单一低分辨率点云进行上采样得到的结果往往较差。
方法
2
本文首次提出了一种触觉增强的图卷积点云超分网络,其主要思想是通过动态图卷积提取触觉特征并与低分辨率点云特征进行融合,以得到更加精确的高分辨率点云。由于触觉点云相比于低分辨率点云更加密集、精确,而且比较容易获取,因而本文将其与原始稀疏点云进行融合辅助后可以获得更加准确的局部特征,从而有效提升上采样的精度。
结果
2
本文首先构建了用于点云超分的三维视触觉数据集(3D Vision and Touch, 3DVT),包含12732个样本,其中70%用于训练新模型,30%用于测试;其次,本文采用倒角距离作为评价指标对数据集进行了测试和验证。实验结果表明,不添加触觉辅助信息时,超分后点云的平均倒角距离为3.009*
<math id="M1"><msup><mrow><mn mathvariant="normal">10</mn></mrow><mrow><mo>-</mo><mn mathvariant="normal">3</mn></mrow></msup></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257110&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257109&type=
5.16466665
2.53999996
,加入一次触觉信息融合后,平均倒角距离降低为1.931*
<math id="M2"><msup><mrow><mn mathvariant="normal">10</mn></mrow><mrow><mo>-</mo><mn mathvariant="normal">3</mn></mrow></msup></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257117&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257116&type=
5.16466665
2.53999996
,加入两次触觉信息融合后,平均倒角距离进一步降低为1.916*
<math id="M3"><msup><mrow><mn mathvariant="normal">10</mn></mrow><mrow><mo>-</mo><mn mathvariant="normal">3</mn></mrow></msup></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257114&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257113&type=
5.16466665
2.53999996
,从而验证了本文网络对点云超分效果的提升作用。同时,不同物体的可视化效果图也表明,加入触觉信息辅助后的上采样点云分布更加均匀,边缘更加平滑。此外,进一步的噪声实验显示,在触觉信息的辅助下,本文提出的网络对噪声具有更好的鲁棒性。在以3DVT数据集为基础的对比实验中,相比于现有最新算法,本文算法的平均倒角距离降低了19.22%,取得了最好的实验结果。
结论
2
通过使用本文提出的触觉增强的图卷积点云超分网络,借助动态图卷积提取触觉点云特征并融合低分点云,可以有效地提高超分重构后高分辨率点云的质量,并且对周围噪声具有良好的鲁棒性。
Objective
2
With the rapid development of 3D scanners and 3D point cloud acquisition technologies, the application of 3D point clouds in computer vision, robot guidance, industrial design, and other fields has become increasingly widespread. As long as the point cloud is sufficiently dense, we can construct accurate models to meet the demands of various advanced point cloud tasks. Accurate point clouds facilitate better performance in tasks such as semantic segmentation, completion, classification. However, due to limitations such as sensor resolution, scanning time, and scanning conditions, the acquired point clouds are often sparse. Existing point cloud upsampling methods only address single low-resolution point clouds, and they yield poor results when upsampling highly sparse point clouds at larger magnification rates. And they do not use additional modalities for assistance. Moreover, tactile information has gradually been applied to 3D reconstruction tasks, reconstructing complete 3D models using multi-modal information such as RGB images, depth images, and tactile information. But tactile point clouds have not been applied to point cloud super-resolution tasks yet.
Method
2
In this study, we proposed a tactile-enhanced graph convolutional point cloud super-resolution network that uses dynamic graph convolution to extract tactile features and fuse them with low-resolution point cloud features to obtain more accurate high-resolution point clouds. This network consists of feature extraction module and upsampling module. The feature extraction module extracts features from low-resolution point cloud and tactile point cloud, while the upsampling module performs feature expansion and coordinate reconstruction to output high-resolution point cloud. The key to this network lies in extracting features from tactile point clouds and fusing them with low-resolution point cloud features. The tactile feature extraction module adopts Multilayer Perceptron (MLP) and 4-layer cascaded dynamic graph convolution. The tactile point cloud is mapped to a high-dimensional space through multilayer perceptron for subsequent feature extraction. The dynamic graph convolution module consists mainly of K-Nearest-Neighbors (KNN) and edge convolution. In each dynamic graph convolution, the KNN algorithm is used to recompute the neighboring points of each point and construct the graph structure. KNN algorithm can effectively aggregates local feature information, and edge convolution extracts features of center points and neighboring points. The k-nearest neighbors of each point vary in different network layers, leading to the graph structure dynamically updated in each layer. The feature extraction for the low-resolution point cloud adopts graph convolution. The graph structure is first constructed using the KNN algorithm and then shared with subsequent layers. After fusing the features of the low-resolution point cloud and the tactile point cloud, the point cloud features undergo further progressive feature extraction, mainly using dense connected graph convolution modules. The bottleneck layer compresses features to reduce computational complexity in subsequent layers. Two parallel dense graph convolutions extract local features, while the global pooling layer extracts global features. Finally, the feature rearrangement module and coordinate reconstruction module map the high-dimensional features back to the three-dimensional coordinate system. Compared to the low-resolution point cloud, the local tactile point cloud is denser and more precise, while the low-resolution point cloud is often sparser and contains less local information. With the assistance of tactile information, enhanced local features can be obtained.
Result
2
In this study, we reconstructed a point cloud super-resolution dataset 3DVT (3D Vision and Touch) with tactile information and trained on this dataset. This dataset contains a diverse range of object categories and a sufficiently large number of samples. Using chamfer distance as the evaluation metric, experimental results show that without adding tactile information, the average chamfer distance is 3.009*
<math id="M4"><msup><mrow><mn mathvariant="normal">10</mn></mrow><mrow><mo>-</mo><mn mathvariant="normal">3</mn></mrow></msup></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257101&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257115&type=
5.16466665
2.53999996
,
with one instance of tactile information added, the average chamfer distance decreases to
1.931*
<math id="M5"><msup><mrow><mn mathvariant="normal">10</mn></mrow><mrow><mo>-</mo><mn mathvariant="normal">3</mn></mrow></msup></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257117&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257116&type=
5.16466665
2.53999996
, and with two instances of tactile information added, the average chamfer distance further reduces to 1.916*
<math id="M6"><msup><mrow><mn mathvariant="normal">10</mn></mrow><mrow><mo>-</mo><mn mathvariant="normal">3</mn></mrow></msup></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257117&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257116&type=
5.16466665
2.53999996
. Tactile point clouds can enhance the quality of high-resolution point clouds and serve as an auxiliary aid in point cloud super-resolution tasks. Visualizations of different objects demonstrate that the distribution of upsampled point clouds becomes more uniform and the edges become smoother with the assistance of tactile information. With the assistance of tactile point clouds, the network can better fill in the holes in the point cloud and reduce the generation of outliers. The quantitative results of chamfer distance and density-aware chamfer distance obtained by super-resolution experiments on different objects confirm the effectiveness of tactile point cloud assistance in the super-resolution task. Furthermore, for complex objects, this improvement is even more pronounced. The noise experiments show that at a noise level of 1%, the average chamfer distance without tactile information is 3.132*
<math id="M7"><msup><mrow><mn mathvariant="normal">10</mn></mrow><mrow><mo>-</mo><mn mathvariant="normal">3</mn></mrow></msup></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257107&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257106&type=
5.16466665
2.53999996
, while with the inclusion of two instances of tactile information, the average chamfer distance is 1.954*
<math id="M8"><msup><mrow><mn mathvariant="normal">10</mn></mrow><mrow><mo>-</mo><mn mathvariant="normal">3</mn></mrow></msup></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257117&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257116&type=
5.16466665
2.53999996
. At a noise level of 3%, the average chamfer distance without tactile information is 3.331*
<math id="M9"><msup><mrow><mn mathvariant="normal">10</mn></mrow><mrow><mo>-</mo><mn mathvariant="normal">3</mn></mrow></msup></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257119&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257118&type=
5.16466665
2.53999996
, and with the inclusion of two instances of tactile information, the average chamfer distance is 2.001*
<math id="M10"><msup><mrow><mn mathvariant="normal">10</mn></mrow><mrow><mo>-</mo><mn mathvariant="normal">3</mn></mrow></msup></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257123&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=65257120&type=
5.16466665
2.53999996
. The experiments demonstrate that with the assistance of tactile information, the impact of noise on the network is reduced, indicating that the network exhibits strong robustness.
Conclusion
2
Dynamic graph convolution can effectively extract initial features from tactile point clouds, and the tactile point cloud features contain rich local information. Through feature fusion, it can effectively assist the point cloud super-resolution task. The proposed tactile-enhanced graph convolutional point cloud super-resolution network in this paper uses dynamic graph convolution to extract tactile features and fuse them with low-resolution point cloud features, effectively improving the quality of high-resolution point clouds and exhibiting strong robustness. The superiority of the method lies in its ability to achieve better results by incorporating tactile information without updating the network architecture. This method can provide high-quality point clouds for advanced visual tasks such as point cloud classification and object detection, laying the foundation for further development and application of point clouds.
点云超分触觉点云特征提取特征融合动态图卷积多模态
Point Cloud Super-resolutionTactile Point CloudsFeature ExtractionFeature FusionDynamic Graph ConvolutionMultimodality
Alexa M, Behr J, Cohen-Or D, Fleishmanet S, Levin D and Silva C T. 2003. Computing and rendering point set surfaces. IEEE Transactions on Visualization and Computer Graphics, 9(1): 3-15 [DOI: 10.1109/TVCG.2003.1175093http://dx.doi.org/10.1109/TVCG.2003.1175093]
Guo Y L, Wang H Y, Hu Q Y, Liu H, Liu L and Bennamoun M. 2020. Deep learning for 3d point clouds: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12): 4338-4364. [DOI: 10.1109/TPAMI.2020.3005434http://dx.doi.org/10.1109/TPAMI.2020.3005434]
He Y, Tang D H, Zhang Y D, Xue X Y and Fu Y W. 2023. G-rad-PU: Arbitrary-Scale Point Cloud Upsampling via Gradient Descent with Learned Distance Functions//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Reco-gnition. Vancouver, Canada: IEEE: 5354-5363 [DOI:10.1109/CVPR52729.2023.00518http://dx.doi.org/10.1109/CVPR52729.2023.00518]
Lambeta M, Chou P W, Tian S, Yang B, Maloon B, Most V R, Stroud D, Santos R, Byagowi A, Kammerer G, Jayaraman D and Calandra R. 2020. Digit: A novel design for a low-cost compact high-resolution touch sensor with application to in-hand manipulation. IEEE Robotics and Automation Letters, 5(3): 3838-3845 [DOI: 10.1109/LRA.2020.2977257http://dx.doi.org/10.1109/LRA.2020.2977257]
Li R H, Li X Z, Heng P A and Fu C W. 2021. Point cloud upsampling via disentangled refinement//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE:344-353 [DOI:10.1109/CVPR46437.2021.00041http://dx.doi.org/10.1109/CVPR46437.2021.00041]
Li R H, Li X Z, Fu C W, Cohen-Or D and Heng P A. 2019. Pu-gan: a point cloud upsampling adversarial network//Proceedings of the IEEE/CVF International Conference on Computer Vision. Long Beach, USA: IEEE: 7203-7212 [DOI: 10.1109/ICCV.2019.00730http://dx.doi.org/10.1109/ICCV.2019.00730]
Li R, Platt R, Yuan W Z, Pas A T, Roscup N, Srinivasan M A and Adelson E. 2014. Localization and manipulation of small parts using gelsight touch sensing//2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. Chicago, USA: IEEE: 3988-3993 [DOI: 10.1109/IROS.2014.6943123http://dx.doi.org/10.1109/IROS.2014.6943123]
Qiu S, Anwar S, Barnes N. 2021. Geometric back-projection net-work for point cloud classification. IEEE Transactions on M-ultimedia,24:1943-1955 [DOI: 10.1109/TMM.2021.3074240http://dx.doi.org/10.1109/TMM.2021.3074240]
Qing Y H, Bo Y, Lin H X, Rosa S, Yu L G, Zhi H W, Trigoni N and Markham A. 2020. Randla-net: Efficient semantic segmentation of large-scale point clouds//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA : IEEE: 11108-11117 [DOI: 10.1109/ CVPR42600.2020.01112http://dx.doi.org/10.1109/CVPR42600.2020.01112]
Qiu S, Wu Y F, Anwar S and Li C Y. 2021. Investigating attention mechanism in 3d point cloud object detection[C]//2021 International Conference on 3D Vision (3DV). London, UK: IEEE: 403-412 [DOI: 10.1109/3DV53792.2021.00050http://dx.doi.org/10.1109/3DV53792.2021.00050]
Qi C R, Su H, Mo K and Guibas L J. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 652-660 [DOI: 10.1109/ CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16]
Qian G C, Abualshour A, Li G H, Thabet A, Ghanem B. 2021. Pu-gcn: Point cloud upsampling using graph convolutional networks//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 11683-11692 [DOI: 10.1109/ CVPR46437.2021.01151http://dx.doi.org/10.1109/CVPR46437.2021.01151]
Qiu S, Anwar S and Barnes N. 2022. Pu-transformer: Point cloud upsampling transformer//Proceedings of the Asian Conference on Computer Vision. New Orleans, USA: IEEE: 2475-2493 [DOI: 10.1007/978-3-031-26319-4_20http://dx.doi.org/10.1007/978-3-031-26319-4_20]
Rustler L, Lundell J, Behrens J K, Kyrki V and Hoffmann M. 2022. Active Visuo-Haptic Object Shape Completion. IEEE Ro-botics and Automation Letters, 7(2): 5254-5261 [DOI: 10.1109/LRA.2022.3152975http://dx.doi.org/10.1109/LRA.2022.3152975]
Smith E, Calandra R, Romero A, Gkioxari G, Meger D, Malik J and Drozdzal M. 2020. 3d shape reconstruction from vision and touch. //Advances in Neural Information Processing Systems. California, USA: NIPS: 14193-14206 [DOI: 10.48550/arXiv.2007.03778http://dx.doi.org/10.48550/arXiv.2007.03778]
Shi W Z, Caballero J, Huszar F, Totz J, Aitken A P, Bishop R, Rueckert D and Wang Z H. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1874-1883 [DOI: 10.1109/ CVPR.2016.207http://dx.doi.org/10.1109/CVPR.2016.207]
Thomas H, Qi C R, Deschaud J E, Marcotegui B, Goulette F and Guibas L J. 2019. Kpconv: Flexible and deformable convolution for point clouds//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, Korea: IEEE: 2019: 6411-6420 [DOI: 10.1109/ICCV.2019.00651http://dx.doi.org/10.1109/ICCV.2019.00651]
Wang Y, Sun Y B, Liu Z W, Sarma S E, Bronstein M M and Solomon J M. 2019. Dynamic graph cnn for learning on point clouds. Acm Transactions on Graphics (tog), 38(5): 1-12 [DOI: 10.1145/ 3326362http://dx.doi.org/10.1145/3326362]
Wu T, Pan L, Zhang J, Wang T, Liu Z, and Lin D. 2021. Dens-ity-aware chamfer distance as a comprehensive metric for po-int cloud completion[EB/OL]. [2023-07-20]. https://arxiv.org/pdf/2111.12702.pdfhttps://arxiv.org/pdf/2111.12702.pdf
Wang S X, Wu J J, Sun X Y, Yuan W Z, Freeman W T, Tenenbaum J B, Adelson E H. 2018. 3d shape perception from monocular vision, touch, and shape priors//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Salt Lake City, USA: IEEE: 1606-1613 [DOI: 10.1109/IROS.2018.8593430http://dx.doi.org/10.1109/IROS.2018.8593430]
Watkins-Valls D, Varley J and Allen P. 2019. Multi-modal geometric learning for grasping and manipulation[C]//2019 International Conference on Robotics and Automation (ICRA). Long Beach, USA: IEEE: 7339-7345 [DOI: 10.1109/ICRA.2019.8794233http://dx.doi.org/10.1109/ICRA.2019.8794233]
Yu L, Li X, Fu C W, Cohen-Or D and Heng P A. 2018. Pu-net: Point cloud upsampling network//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2790-2799 [DOI: 10.1109/ CVPR.2018.00295http://dx.doi.org/10.1109/CVPR.2018.00295]
Yuan W, Dong S and Adelson E H. 2017. Gelsight: High-resolution robot touch sensors for estimating geometry and force. Sensors,17(12): 2762 [DOI: 10.3390/s17122762http://dx.doi.org/10.3390/s17122762]
Yifan W, Wu S H, Huang H, Cohen-Or D and Sorkine-Hornung O. 2019. Patch-based progressive 3d point set upsampling//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5958-5967 [DOI: 10.1109/ CVPR.2019.00611http://dx.doi.org/10.1109/CVPR.2019.00611]
Zhang Z J. 2018. Improved adam optimizer for deep neural networks//IEEE/ACM 26th International Symposium on Quality of Service (IWQoS). Banff, AB, Canada: IEEE: 1-2 [DOI: 10.1109/IWQoS.2018.8624183http://dx.doi.org/10.1109/IWQoS.2018.8624183]
Zhang J X. 2022. Research on Point Cloud Super-resolution and Completion Algorithm Based on Graph Convolution. Beijing: Beijing Jiaotong University
张健翔. 2022. 基于图卷积的点云超分辨率和补全算法研究. 北京: 北京交通大学
相关作者
相关机构