面向点云几何压缩的隐式编码网络

陈佳慧; 方广驰; 李浩然; 张晔; 黄小红; 郭裕兰

doi:10.11834/jig.230906

图像处理和编码 | 浏览量 : 0 下载量: 0 CSCD: 0

PDF
导出
分享
收藏
专辑

面向点云几何压缩的隐式编码网络
An implicit coding network for point cloud geometry compression
2024年29卷第12期页码：3612-3627
纸质出版日期： 2024-12-16 ，
DOI： 10.11834/jig.230906
稿件说明：

移动端阅览

陈佳慧，方广驰，李浩然，张晔，黄小红，郭裕兰. 2024. 面向点云几何压缩的隐式编码网络. 中国图象图形学报， 29(12):3612-3627

Chen Jiahui， Fang Guangchi， Li Haoran， Zhang Ye， Huang Xiaohong， Guo Yulan. 2024. An implicit coding network for point cloud geometry compression. Journal of Image and Graphics， 29(12):3612-3627
陈佳慧，方广驰，李浩然，张晔，黄小红，郭裕兰. 2024. 面向点云几何压缩的隐式编码网络. 中国图象图形学报， 29(12):3612-3627 DOI： 10.11834/jig.230906.

Chen Jiahui， Fang Guangchi， Li Haoran， Zhang Ye， Huang Xiaohong， Guo Yulan. 2024. An implicit coding network for point cloud geometry compression. Journal of Image and Graphics， 29(12):3612-3627 DOI： 10.11834/jig.230906.

摘要

目的

现有点云几何压缩算法通常将点云转换为八叉树或带潜在特征的稀疏点，从而提高数据结构的存储效率。这些方法将点云量化至三维网格点，导致点云所在表面的精度受限于量化分辨率。针对这一问题，本文将点云转化为连续的隐式表征，提出一种基于隐式表征的点云几何压缩算法框架，以克服量化分辨率对压缩质量的不利影响。

方法

该框架由基于符号对数距离场的隐式表征与带乘性分支结构的神经网络组成。具体来说，本文在编码阶段利用神经网络拟合隐式表征，并对该网络进行模型压缩，然后在解码阶段结合改进的 Marching Cube（MC）算法重建点云所在表面，采样恢复点云数据。

结果

本文在ABC（a big CAD model dataset）、Famous与 MPEG PCC（MPEG point cloud compression dataset）数据集上进行了点云表面压缩实验。与基准算法INR（implicit neural representations for image compression）相比，本文算法的L1倒角损失平均下降了12.4%，Normal Consistency与F-score指标平均提升了1.5%与13.6%，压缩效率随模型参数量增大而提升，平均增幅为12.9%。与几何压缩标准算法G-PCC（geometry-based point cloud compression）相比，本文算法在存储大小为10 KB下依然保持55 dB以上的D1-PSNR重建性能，有效压缩上限高于G-PCC。此外，消融实验分别验证了本文提出的隐式表征和神经网络结构的有效性。

结论

实验结果表明，本文提出的点云压缩算法克服了现有算法的分辨率限制，不仅提升了表面重建精度，而且提升了点云表面的压缩效率与有效压缩上限。

Abstract

Objective

Point clouds captured by depth sensors or generated by reconstruction algorithms are essential for various 3D vision tasks， including 3D scene understanding， scan registration， and 3D reconstruction. However， a simple scene or object contains massive amounts of unstructured points， leading to challenges in the storage and transmission of these point cloud data. Therefore， developing point cloud geometry compression algorithms is important to effectively handle and process point cloud data. Existing point cloud compression algorithms typically involve converting point clouds into a storage-efficient data structure， such as an octree representation or sparse points with latent features. These intermediate representations are then encoded as a compact bitstream by using either handcrafted or learning-based entropy coders. Although the correlation of spatial points effectively improves compression performance， existing algorithms may not fully exploit these points as representations of the object surface and topology. Recent studies have addressed this problem by exploring implicit representations and neural networks for surface reconstruction. However， these methods are primarily tailored for 3D objects represented as occupancy fields and signed distance fields， thus limiting their applicability to point clouds and non-watertight meshes in terms of surface representation and reconstruction. Furthermore， the neural networks used in these approaches often rely on simple multi-layer perceptron structures， which may lack capacity and compression efficiency for point cloud geometry compression tasks.

Method

To deal with these limitations， we proposed a novel point cloud geometry compression framework， including a signed logarithmic distance field， an implicit network structure with the multiplicative branches， and an adaptive marching cube algorithm for surface extraction. First， the point cloud surface （serving as the zero level-set） maps the arbitrary points in space to the distance values of their nearest points on the point cloud surface. We design an implicit representation called signed logarithmic distance field （SLDF）， which utilizes the thickness assumption and logarithmic parameterization to fit arbitrary point cloud surfaces. Afterward， we apply a multiplicative implicit neural encoding network （MINE） to encode the surface as a compact neural representation. MINE combines sinusoidal activation functions and multiplicative operators to enhance the capability and distribution characteristics of the network. The overfitting process transforms the mapping function from point cloud coordinates to implicit distance fields into a neural network， which is subsequently utilized for model compression. Through the decompressed network， the continuous surface is reconstructed using the adaptive marching cubes algorithm （AMC）， which incorporates a dual-layer surface fusion process to further enhance the accuracy of surface extraction for SLDF.

Result

We compared our algorithm with six state-of-the-art algorithms， including the surface compression approaches based on implicit representation and point cloud compression methods， on three public datasets， namely， ABC， Famous， and MPEG PCC. The quantitative evaluation metrics included the rate-distortion curves of chamfer-L1 distance （L1-CD）， normal consistency （NC）， F-score for continuous point cloud surface， and the rate-distortion curve of D1-PSNR for quantized point cloud surface. Compared with the suboptimal method （i.e.， INR）， our proposed method reduces L1-CD loss by 12.4% and improves the NC and F-score performance by 1.5% and 13.6% on the ABC and Famous datasets， respectively. Moreover， the compression efficiency increases by an average of 12.9% along with the growth of model parameters. On multiple MPEG PCC datasets with samples taken from the 512-resolution MVUB dataset， 1024-resolution 8iVFB dataset， and 2048-resolution Owlii dataset， our method achieves a D1-PSNR performance of over 55 dB within the 10 KB range， which highlights its higher effective compression limit compared with G-PCC. Ablation experiments show that in the absence of SLDF， the L1-CD loss increases by 18.53%， while the D1-PSNR performance increases by 15 dB. Similarly， without the MINE network， the L1-CD loss increases by 3.72%， and the D1-PSNR performance increases by 2.67 dB.

Conclusion

This work explores the implicit representation for point cloud surfaces and proposes an enhanced point cloud compression framework. We initially design SLDF to extend the implicit representations of arbitrary topologies in point clouds， and then we use the multiplicative branches network to enhance the capability and distribution characteristics of the network. We then apply a surface extraction algorithm to enhance the quality of the reconstructed point cloud. In this way， we obtain a unified framework for the geometric compression of point cloud surfaces at arbitrary resolutions. Experimental results demonstrate that our proposed method achieves a promising performance in point cloud geometry compression.

关键词

点云几何压缩隐式表征三维重建模型压缩表面提取算法

Keywords

point cloud geometry compressionimplicit representationsurface reconstructionmodel compressionsurface extraction algorithm

references

Cao C， Preda M and Zaharia T. 2019. 3D point cloud compression： a survey//Proceedings of the 24th International Conference on 3D Web Technology. Los Angeles， USA： ACM： 1-9 ［DOI： 10.1145/3329714.3338130http://dx.doi.org/10.1145/3329714.3338130］

Chang A X， Funkhouser T， Guibas L， Hanrahan P， Huang Q X， Li Z M， Savarese S， Savva M， Song S R， Su H， Xiao J X， Yi L and Yu F. 2015. ShapeNet： an information-rich 3D model repository ［EB/OL］. ［2024-01-15］. http://arxiv.org/pdf/1512.03012.pdfhttp://arxiv.org/pdf/1512.03012.pdf

Chen H， He B， Wang H Y， Ren Y X， Lim S N and Shrivastava A. 2021. NeRV： neural representations for videos//Proceedings of the 35th International Conference on Neural Information Processing Systems. Online： Curran Associates Inc.： 21557-21568

Chibane J， Mir A and Pons-Moll G. 2020. Neural unsigned distance fields for implicit function learning//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： 21638-21652

d’Eon E， Harrison B， Myers T and Chou P A. 2017. 8i voxelized full bodies-a voxelized point cloud dataset. ISO/IEC JTC1/SC29 Joint WG11/WG1 （MPEG/JPEG） input document WG11M 40059/WG1M74006.Geneva， Switzerland：MPEG

Dupont E， Goliński A， Alizadeh M， Teh Y W and Doucet A. 2021. COIN： compression with implicit neural representations ［EB/OL］. ［2024-01-15］. http://arxiv.org/pdf/2103.03123.pdfhttp://arxiv.org/pdf/2103.03123.pdf

Erler P， Guerrero P， Ohrhallinger S， Mitra N J and Wimmer M. 2020. Points2Surf learning implicit surfaces from point clouds//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 108-124 ［DOI： 10.1007/978-3-030-58558-7_7http://dx.doi.org/10.1007/978-3-030-58558-7_7］

Fan H Q， Su H and Guibas L. 2017. A point set generation network for 3D object reconstruction from a single image//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 2463-2471 ［DOI： 10.1109/CVPR.2017.264http://dx.doi.org/10.1109/CVPR.2017.264］

Fang G C， Hu Q Y， Wang H Y， Xu Y L and Guo Y L. 2022. 3DAC： learning attribute compression for point clouds//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 14799-14808 ［DOI： 10.1109/CVPR52688.2022.01440http://dx.doi.org/10.1109/CVPR52688.2022.01440］

Fathony R， Sahu A K， Willmott D and Kolter J Z. 2021. Multiplicative filter networks//Proceedings of the 9th International Conference on Learning Representations. Virtual Event， Austria： OpenReview.net

Han S， Mao H Z and Dally W J. 2016. Deep compression： compressing deep neural network with pruning， trained quantization and huffman coding//Proceedings of the 4th International Conference on Learning Representations. San Juan， USA： OpenReview.net

Hart J C. 1996. Sphere tracing： a geometric method for the antialiased ray tracing of implicit surfaces. The Visual Computer， 12（10）： 527-545 ［DOI： 10.1007/s003710050084http://dx.doi.org/10.1007/s003710050084］

Hinton G， Vinyals O and Dean J. 2015. Distilling the knowledge in a neural network ［EB/OL］. ［2024-01-15］. http://arxiv.org/pdf/1503.02531.pdfhttp://arxiv.org/pdf/1503.02531.pdf

Huang L L， Wang S L， Wong K， Liu J and Urtasun R. 2020. OctSqueeze： octree-structured entropy model for LiDAR compression//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 1310-1320 ［DOI： 10.1109/CVPR42600.2020.00139http://dx.doi.org/10.1109/CVPR42600.2020.00139］

Jang E S， Preda M， Mammou K， Tourapis A M， Kim J， Graziosi D B， Rhyu S and Budagavi M. 2019. Video-based point-cloud-compression standard in MPEG： from evidence collection to committee draft ［Standards in a Nutshell］. IEEE Signal Processing Magazine， 36（3）： 118-123 ［DOI： 10.1109/MSP.2019.2900721http://dx.doi.org/10.1109/MSP.2019.2900721］

Koch S， Matveev A， Jiang Z S， Williams F， Artemov A， Burnaev E， Alexa M， Zorin D and Panozzo D. 2019. ABC： a big CAD model dataset for geometric deep learning//Proceedings of 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 9593-9603 ［DOI： 10.1109/CVPR.2019.00983http://dx.doi.org/10.1109/CVPR.2019.00983］

Lindell D B， Van Veen D， Park J J and Wetzstein G. 2022. Bacon： band-limited coordinate networks for multiscale scene representation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 16231-16241 ［DOI： 10.1109/CVPR52688.2022.01577http://dx.doi.org/10.1109/CVPR52688.2022.01577］

Long X X， Cheng X J， Zhu H， Zhang P J， Liu H M， Li J， Zheng L T， Hu Q Y， Liu H， Cao X， Yang R G， Wu Y H， Zhang G F， Liu Y B， Xu K， Guo Y L and Chen B Q. 2021. Recent progress in 3D vision. Journal of Image and Graphics， 26（6）： 1389-1428

龙霄潇，程新景，朱昊，张朋举，刘浩敏，李俊，郑林涛，胡庆拥，刘浩，曹汛，杨睿刚，吴毅红，章国锋，刘烨斌，徐凯，郭裕兰，陈宝权. 2021. 三维视觉前沿进展. 中国图象图形学报， 26（6）： 1389-1428 ［DOI： 10.11834/jig.210043http://dx.doi.org/10.11834/jig.210043］

Long X X， Lin C， Liu L J， Liu Y， Wang P， Theobalt C， Komura T and Wang W P. 2023. NeuralUDF： learning unsigned distance fields for multi-view reconstruction of surfaces with arbitrary topologies//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver， Canada： IEEE： 20834-20843 ［DOI： 10.1109/CVPR52729.2023.01996http://dx.doi.org/10.1109/CVPR52729.2023.01996］

Loop C， Cai Q， Escolano S O and Chou P A. 2016. Microsoft voxelized upper bodies-a voxelized point cloud dataset. ISO/IEC JTC1/SC29 Joint WG11/WG1 （MPEG/JPEG） input document m38673/M72012.Geneva， Switzerland： MEPG

Lorensen W E and Cline H E. 1987. Marching cubes： a high resolution 3D surface construction algorithm//Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques ［s.l.］： ACM： 163-169 ［DOI： 10.1145/37401.37422http://dx.doi.org/10.1145/37401.37422］

Ma B， Han Z， Liu Y S and Zwicker M. 2021. Neural-pull： learning signed distance functions from point clouds by learning to pull space onto surfaces//Proceedings of the 38th International Conference on Machine Learning. Online：PMLR：139：7246-7257［DOI：10.48550/arXiv.2011.13495http://dx.doi.org/10.48550/arXiv.2011.13495］

Maglo A， Lavoué G， Dupont F and Hudelot C. 2015. 3D mesh compression： survey， comparisons， and emerging trends. ACM Computing Surveys， 47（3）： #44 ［DOI： 10.1145/2693443http://dx.doi.org/10.1145/2693443］

Mamou K， Zaharia T and Prêteux F. 2009. A triangle-fan-based approach for low complexity 3D mesh compression//Proceedings of the 16th IEEE International Conference on Image Processing. Cairo， Egypt： IEEE： 3513-3516 ［DOI： 10.1109/ICIP.2009.5414075http://dx.doi.org/10.1109/ICIP.2009.5414075］

Mescheder L， Oechsle M， Niemeyer M， Nowozin S and Geiger A. 2019. Occupancy networks： learning 3D reconstruction in function space//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 4455-4465 ［DOI： 10.1109/CVPR.2019.00459http://dx.doi.org/10.1109/CVPR.2019.00459］

Mildenhall B， Srinivasan P P， Tancik M， Barron J T， Ramamoorthi R and Ng R. 2020. NeRF： representing scenes as neural radiance fields for view synthesis//Proceedings of the 16th European Conference on Computer Vision. Online： Springer： 405-421 ［DOI：10.1007/978-3-030-58452-8_24http://dx.doi.org/10.1007/978-3-030-58452-8_24］

Park J J， Florence P， Straub J， Newcombe R and Lovegrove S. 2019. DeepSDF： learning continuous signed distance functions for shape representation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 165-174 ［DOI： 10.1109/CVPR.2019.00025http://dx.doi.org/10.1109/CVPR.2019.00025］

Quach M， Valenzise G and Dufaux F. 2020. Improved deep point cloud geometry compression//Proceedings of 2020 IEEE 22nd International Workshop on Multimedia Signal Processing. Tampere， Finland： IEEE： 1-6 ［DOI： 10.1109/MMSP48831.2020.9287077http://dx.doi.org/10.1109/MMSP48831.2020.9287077］

Que Z Z， Lu G and Xu D. 2021. VoxelContext-Net： an octree based framework for point cloud compression//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 6038-6047 ［DOI： 10.1109/CVPR46437.2021.00598http://dx.doi.org/10.1109/CVPR46437.2021.00598］

Ren S Y， Hou J H， Chen X D， He Y and Wang W P. 2023. GeoUDF： surface reconstruction from 3D point clouds via geometry-guided distance representation//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Paris， France： IEEE： 14168-14178 ［DOI： 10.1109/ICCV51070.2023.01307http://dx.doi.org/10.1109/ICCV51070.2023.01307］

Schwarz S， Preda M， Baroncini V， Budagavi M， Cesar P， Chou P A， Cohen R A， Krivokuća M， Lasserre S， Li Z， Llach J， Mammou K， Mekuria R， Nakagami O， Siahaan E， Tabatabai A， Tourapis A M and Zakharchenko V. 2019. Emerging MPEG standards for point cloud compression. IEEE Journal on Emerging and Selected Topics in Circuits and Systems， 9（1）： 133-148 ［DOI： 10.1109/JETCAS.2018.2885981http://dx.doi.org/10.1109/JETCAS.2018.2885981］

Sitzmann V， Martel J N P， Bergman A W， Lindell D B and Wetzstein G. 2020. Implicit neural representations with periodic activation functions//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： 7462-7473 ［DOI： 10.48350/arXiv.2006.09661http://dx.doi.org/10.48350/arXiv.2006.09661］

Strümpler Y， Postels J， Yang R， van Gool L and Tombari F. 2022. Implicit neural representations for image compression//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv， Israel： Springer： 74-91 ［DOI： 10.1007/978-3-031-19809-0_5http://dx.doi.org/10.1007/978-3-031-19809-0_5］

Sullivan G J， Ohm J R， Han W J and Wiegand T. 2012. Overview of the high efficiency video coding （HEVC） standard. IEEE Transactions on Circuits and Systems for Video Technology， 22（12）： 1649-1668 ［DOI： 10.1109/TCSVT.2012.2221191http://dx.doi.org/10.1109/TCSVT.2012.2221191］

Tatarchenko M， Richter S R， Ranftl R， Li Z W， Koltun V and Brox T. 2019. What do single-view 3D reconstruction networks learn?//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 3400-3409 ［DOI： 10.1109/CVPR.2019.00352http://dx.doi.org/10.1109/CVPR.2019.00352］

Turk G and Levoy M. 1994. Zippered polygon meshes from range images//Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques. New York， USA： ACM： 311-318 ［DOI： 10.1145/192161.192241http://dx.doi.org/10.1145/192161.192241］

Wang B， Yu Z D， Yang B， Qin J， Breckon T， Shao L， Trigoni N and Markham A. 2022. RangeUDF： semantic surface reconstruction from 3D point clouds ［EB/OL］. ［2024-01-15］. http://arxiv.org/pdf/2204.09138.pdfhttp://arxiv.org/pdf/2204.09138.pdf

Wang J Q， Ding D D， Li Z and Ma Z. 2021. Multiscale point cloud geometry compression//Proceedings of 2021 Data Compression Conference. Snowbird， USA： IEEE： 73-82 ［DOI： 10.1109/DCC50243.2021.00015http://dx.doi.org/10.1109/DCC50243.2021.00015］

Xu Y， Lu Y and Wen Z. 2017. Owlii dynamic human mesh sequence dataset. ISO/IEC JTC1/SC29/WG11 m41658. Macau， China： MPEG

Yang Y Q， Feng C， Shen Y R and Tian D. 2018. FoldingNet： point cloud auto-encoder via deep grid deformation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 206-215 ［DOI： 10.1109/CVPR.2018.00029http://dx.doi.org/10.1109/CVPR.2018.00029］

Ye J L， Chen Y T， Wang N Y and Wang X L. 2022. GIFS： neural implicit function for general shape representation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 12819-12829 ［DOI： 10.1109/CVPR52688.2022.01249http://dx.doi.org/10.1109/CVPR52688.2022.01249］

Yu P P， Zuo D， Huang Y E， Huang R S， Wang H Y， Guo Y L and Liang F. 2023. Sparse representation based deep residual geometry compression network for large-scale point clouds//Proceedings of 2023 IEEE International Conference on Multimedia and Expo. Brisbane， Australia： IEEE： 2555-2560 ［DOI： 10.1109/ICME55011.2023.00435http://dx.doi.org/10.1109/ICME55011.2023.00435］

Zhou J S， Ma B R， Li S J， Liu Y S and Han Z Z. 2023. Learning a more continuous zero level set in unsigned distance fields through level set projection//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Paris， France： IEEE： 3158-3169 ［DOI： 10.1109/ICCV51070.2023.00295http://dx.doi.org/10.1109/ICCV51070.2023.00295］

Zhu W， Zhang Y H， Ying Y， Zheng Y Y and He D F. 2023. A dense residual structure and multi-scale pruning-relevant point cloud compression network. Journal of Image and Graphics， 28（7）： 2105-2119

朱威，张雨航，应悦，郑雅羽，何德峰. 2023. 结合密集残差结构和多尺度剪枝的点云压缩网络. 中国图象图形学报， 28（7）： 2105-2119 ［DOI： 10.11834/jig.220047http://dx.doi.org/10.11834/jig.220047］

文章被引用时，请邮件提醒。

提交