利用本征属性分类的神经辐射场视角及语义一致性重建
Semantic and consistent neural radiance field reconstruction method based on intrinsic decomposition via classification
- 2025年30卷第2期 页码:559-574
纸质出版日期: 2025-02-16
DOI: 10.11834/jig.240140
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2025-02-16 ,
移动端阅览
曾志鸿, 王宗继, 张源奔, 蔡伟南, 张利利, 郭岩, 刘俊义. 2025. 利用本征属性分类的神经辐射场视角及语义一致性重建. 中国图象图形学报, 30(02):0559-0574
Zeng Zhihong, Wang Zongji, Zhang Yuanben, Cai Weinan, Zhang Lili, Guo Yan, Liu Junyi. 2025. Semantic and consistent neural radiance field reconstruction method based on intrinsic decomposition via classification. Journal of Image and Graphics, 30(02):0559-0574
目的
2
基于神经辐射场(neural radiance field,NeRF)的3D场景重建与新视角生成工作正受到研究者的广泛重视,然而现有的神经辐射场方法通常对给定的场景进行高度专门化的表征,且将场景的几何与外观表征为“混合场”,这对场景的几何与外观编辑、场景泛化和3D资源的使用造成了不便。
方法
2
提出了一个学习对象本征属性的神经辐射场分类网络,通过图像增强的方式去除高光和阴影,并使用分类的方式实现颜色分解,即从现实场景中提取室内场景语义级目标的本征属性,在此基础上进行神经辐射场的重建。提出了前点优胜模块与颜色分类模块。前点优胜模块在体渲染阶段优化射线代表的本征属性,从而提升神经辐射场的语义一致性;颜色分类模块在辐射场重建阶段,通过全连接网络进行本征属性的分类优化,提高辐射场的语义及视角间一致性。两个主要模块共同作用,使重建的辐射场具备良好的针对外观的泛化能力,可支持场景重上色、重光照以及针对阴影与高光的编辑等任务。
结果
2
相比于现有的基于神经辐射场的学习进行本征分解的Intrinsic NeRF方法,在Replica数据集中的充分实验表明,在有限的GPU显存和运行时间下,重建的本征属性神经辐射场具备语义及视角间一致性。针对提升语义一致性的前点优胜模块,本文方法在基线模型Semantic NeRF的基础上提高了4.1%,在未加入该模块的基础上提高了 3.9%。针对提升本征分解语义及视角间一致性的颜色分类模块,本文方法在Intrinsic NeRF的本征分解工作基础上提升了10.2%,在未加入颜色分类层的基础上提升了1.7%。
结论
2
本文方法构建的本征属性神经辐射场具备语义及视角间一致性,可描述复杂场景几何关系且具备良好外观泛化性。在场景重上色、重光照、阴影与高光的编辑等任务中取得了视角间一致的逼真效果。
Objective
2
Reconstruction of indoor and outdoor 3D scenes and placement of 3D resources in the real world constitute important development directions in the field of computer vision. Early researchers used voxel, occupancy, grid, and other computer graphics representation methods to achieve good results in terms of storage and rendering efficiency in a variety of mature application areas. However, these methods require time-consuming and laborious manual modeling, experienced modelers, and considerable time and energy. The time-consuming, laborious, and difficult modeling process must be simplified to enhance the application prospects in the 3D scene reconstruction field. By invoking and calculating the implicit field representation, researchers can obtain a realistic scene end-to-end, eliminating the complicated process of traditional modeling. The neural radiance field (NeRF) is the most popular implicit field representation method. Compared with other implicit field methods, the neural implicit field is known for its simplicity and ease of use, but its problems still exist and are rooted in the defects of the implicit field itself. The implicit field is a multidimensional function defined on spatial and directional coordinates, which codifies the geometry of the scene together with the appearance color, resulting in the entangled representation of the independent attributes of the target, causing inconvenience to the application of 3D resources. An important direction regarding implicit fields is “disentanglement” between geometry and appearance. First, the intrinsic decomposition uses some physical priors to avoid the initialization of complex networks. Second, the image is preprocessed into an albedo image independent of the observation direction and a shading image dependent on that. Intrinsic NeRF was the first to apply intrinsic decomposition methods in NeRF, but the decomposition they have used could not produce more reasonable appearance editing results.
Method
2
In this study, a NeRF classification network is proposed to learn the intrinsic properties of objects an
d target characteristics. It separates specular factors from 2D images via the image enhancement method, extracts the intrinsic colors (performs intrinsic decomposition) via the classification method, and then presents shading maps and direct illuminations of semantic-level objects in scenes via intrinsic decomposition expression. On this basis, the NeRF is learned, and the semantic consistency is provided with the help of “the front-point dominance module”, which is a module from the volume rendering stage that optimizes the albedo by “front points”. The consistency between views of the scene is provided with the help of “the color classification layer module”, which is a fully connected neural network from the reconstruction stage that fixes the albedo between different perspectives. Finally, a neural radiation field representing the intrinsic properties of the scene is reconstructed. After the rays are obtained by the internal and external parameters of the image, the positions of the sampling points and the directions of the rays are calculated in the neural network, producing the corresponding 3D properties. In the embedding layer, the position and direction are embedded and transformed into high-dimensional embedding features, which are the inputs of the network. After the 8-layer fully connected multilayer perceptron, the network outputs a 1-dimensional volume density, a 256-dimensional feature vector, and an
N
-dimensional semantic vector (where
N
is the number of semantic classes). The 256-dimensional feature vectors are then input into each 1-layer fully connected network to obtain color, a shading map, and direct illumination. In the inference stage, the model uses the Monte Carlo integral method to transform properties based on sampling points into properties based on rays (i.e., pixels), resulting in a synthetic result of the novel view. The model can disentangle attributes independent of and dependent on the observer. The resulting albedo output has semantic and mult
iview consistency independent of the observation direction. The implicit field shows good generalization for appearance and supports scene recoloring, relighting, and editing for shadows and specular factors.
Result
2
Compared with the existing Intrinsic NeRF method for intrinsic decomposition based on NeRF learning, experiments on the Replica dataset show that under limited GPU memory and running time, this work can obtain intrinsic decomposition results with semantic and multiview consistency. For the “front-point dominance module”, which ensures semantic consistency, this work improves the performance by 4.1% compared with that of the Semantic NeRF. The ablation study revealed an improvement of 3.9% over the baseline model. For the “color classification layer module”, which improves semantic multiview consistency, this work improves the Intrinsic NeRF’s intrinsic decomposition method by 10.2% and the baseline model by 1.7%.
Conclusion
2
A novel NeRF classification network that can learn the intrinsic properties of objects and target characteristics is proposed. Experiments show that the work in this study can produce intrinsic decomposition results with semantic and multiview consistency. Moreover, an implicit field of albedo classification is constructed, which can describe the geometric relationship of complex scenes and shows good generalization for appearance. Realistic and multiview-consistent effects are achieved in tasks of scene recoloring, relighting, and shadow and specular factor editing.
Baslamisli A S , Groenestege T T , Das P , Le H A , Karaoglu S and Gevers T . 2018 . Joint learning of intrinsic images and semantic segmentation // Proceedings of the 15th European Conference on Computer Vision . Munich, Germany : Springer: 286 - 302 [ DOI: 10.1007/978-3-030-01231-1_18 http://dx.doi.org/10.1007/978-3-030-01231-1_18 ]
Boss M , Braun R , Jampani V , Barron J T , Liu C and Lensch H P A . 2021 . NeRD: neural reflectance decomposition from image collections // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal, Canada : IEEE: 12684 - 12694 [ DOI: 10.1109/ICCV48922.2021.01245 http://dx.doi.org/10.1109/ICCV48922.2021.01245 ]
Carbonneau M A , Zaïdi J , Boilard J and Gagnon G . 2024 . Measuring disentanglement: a review of metrics . IEEE Transactions on Neural Networks and Learning Systems , 35 ( 7 ): 8747 - 8761 [ DOI: 10.1109/tnnls.2022.3218982 http://dx.doi.org/10.1109/tnnls.2022.3218982 ]
Dong B , Dong Y , Tong X and Peers P . 2015 . Measurement-based editing of diffuse Albedo with consistent interreflections . ACM Transactions on Graphics , 34 ( 4 ): # 112 [ DOI: 10.1145/2766979 http://dx.doi.org/10.1145/2766979 ]
Fan Q N , Yang J L , Hua G , Chen B Q and Wipf D . 2018 . Revisiting deep intrinsic image decompositions // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE: 8944 - 8952 [ DOI: 10.1109/cvpr.2018.00932 http://dx.doi.org/10.1109/cvpr.2018.00932 ]
Grosse R , Johnson M K , Adelson E H and Freeman W T . 2009 . Ground truth dataset and baseline evaluations for intrinsic image algorithms // Proceedings of the 12th IEEE International Conference on Computer Vision . Kyoto, Japan : IEEE: 2335 - 2342 [ DOI: 10.1109/ICCV.2009.5459428 http://dx.doi.org/10.1109/ICCV.2009.5459428 ]
Insafutdinov E and Dosovitskiy A . 2018 . Unsupervised learning of shape and pose with differentiable point clouds // Proceedings of the 32nd International Conference on Neural Information Processing Systems . Montreal, Canada : Curran Associates Inc.: 2807 - 2817
Kajiya J T and von Herzen B P . 1984 . Ray tracing volume densities . ACM SIGGRAPH Computer Graphics , 18 ( 3 ): 165 - 174 [ DOI: 10.1145/964965.808594 http://dx.doi.org/10.1145/964965.808594 ]
Land E H and McCann J J . 1971 . Lightness and retinex theory . Journal of the Optical Society of America , 61 ( 1 ): 1 - 11 [ DOI: 10.1364/josa.61.000001 http://dx.doi.org/10.1364/josa.61.000001 ]
Liu A , Ginosar S , Zhou T H , Efros A A and Snavely N . 2020a . Learning to factorize and relight a city // Proceedings of the 16th European Conference on Computer Vision . Glasgow, UK : Springer: 544 - 561 [ DOI: 10.1007/978-3-030-58548-8_32 http://dx.doi.org/10.1007/978-3-030-58548-8_32 ]
Liu L J , Gu J T , Zaw Lin K , Chua T S and Theobalt C . 2020b . Neural sparse voxel fields // Proceedings of the 34th International Conference on Neural Information Processing Systems . Vancouver, Canada : Curran Associates Inc.: 15651 - 15663
Liu S , Zhang X M , Zhang Z T , Zhang R , Zhu J Y and Russell B . 2021 . Editing conditional radiance fields // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal, Canada : IEEE: 5773 - 5783 [ DOI: 10.1109/iccv48922.2021.00572 http://dx.doi.org/10.1109/iccv48922.2021.00572 ]
Lorensen W E and Cline H E . 1987 . Marching cubes: a high resolution 3D surface construction algorithm . ACM SIGGRAPH Computer Graphics , 21 ( 4 ): 163 - 169 [ DOI: 10.1145/37402.37422 http://dx.doi.org/10.1145/37402.37422 ]
Luo J D , Huang Z Y , Li Y J , Zhou X W , Zhang G F and Bao H J . 2020 . NIID-net: adapting surface normal knowledge for intrinsic image decomposition in indoor scenes . IEEE Transactions on Visualization and Computer Graphics , 26 ( 12 ): 3434 - 3445 [ DOI: 10.1109/TVCG.2020.3023565 http://dx.doi.org/10.1109/TVCG.2020.3023565 ]
Mescheder L , Oechsle M , Niemeyer M , Nowozin S and Geiger A . 2019 . Occupancy networks: learning 3D reconstruction in function space // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach, USA : IEEE: 4460 - 4470 [ DOI: 10.1109/cvpr.2019.00459 http://dx.doi.org/10.1109/cvpr.2019.00459 ]
Mildenhall B , Srinivasan P P , Tancik M , Barron J T , Ramamoorthi R and Ng R . 2022 . NeRF: representing scenes as neural radiance fields for view synthesis . Communications of the ACM , 65 ( 1 ): 99 - 106 [ DOI: 10.1145/3503250 http://dx.doi.org/10.1145/3503250 ]
Narihira T , Maire M and Yu S X . 2015 . Direct intrinsics: learning Albedo-Shading decomposition by convolutional regression // Proceedings of 2015 IEEE International Conference on Computer Vision . Santiago, Chile : IEEE: 2992 - 2992 [ DOI: 10.1109/iccv.2015.342 http://dx.doi.org/10.1109/iccv.2015.342 ]
Ning X J , Gong L , Han Y , Ma T , Shi Z H , Jin H Y and Wang Y H . 2023 . Semantic segmentation and model matching-integrated indoor scenario-relevant reconstruction method . Journal of Image and Graphics , 28 ( 10 ): 3149 - 3162
宁小娟 , 巩亮 , 韩怡 , 马婷 , 石争浩 , 金海燕 , 王映辉 . 2023 . 结合语义分割与模型匹配的室内场景重建方法 . 中国图象图形学报 , 28 ( 10 ): 3149 - 3162 [ DOI: 10.11834/jig.220518 http://dx.doi.org/10.11834/jig.220518 ]
Oleynikova H , Millane A , Taylor Z , Galceran E , Nieto J and Siegwart R . 2016 . Signed distance fields: a natural representation for both mapping and planning //Proceedings of 2016 RSS Workshop: Geometry and Beyond—Representations, Physics, and Scene Understanding for Robotics. Ann Arbor, USA : University of Michigan: #134 [ DOI: 10.3929/ethz-a-010820134 http://dx.doi.org/10.3929/ethz-a-010820134 ]
Pharr M and Humphreys G . 2010 . Physically Based Rendering: From Theory to Implementation. 2nd ed. San Francisco, USA: Morgan Kaufmann Publishers Inc .
Porter T and Duff T . 1984 . Compositing digital images // Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques . New York, USA : Association for Computing Machinery: 253 - 259 [ DOI: 10.1145/800031.808606 http://dx.doi.org/10.1145/800031.808606 ]
Seitz S M , Matsushita Y and Kutulakos K N . 2005 . A theory of inverse light transport // Proceedings of the 10th IEEE International Conference on Computer Vision . Beijing, China : IEEE: 1440 - 1447 [ DOI: 10.1109/ICCV.2005.25 http://dx.doi.org/10.1109/ICCV.2005.25 ]
Shi J , Dong Y , Su H and Yu S X . 2017 . Learning non-lambertian object intrinsics across shapenet categories // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, USA : 1685 - 1694 [ DOI: 10.1109/CVPR.2017.619 http://dx.doi.org/10.1109/CVPR.2017.619 ]
Sitzmann V , Zollhöfer M and Wetzstein G . 2019 . Scene representation networks: continuous 3D-structure-aware neural scene representations // Proceedings of the 33rd International Conference on Neural Information Processing Systems . Vancouver, Canada : Curran Associates Inc.: #101
Straub J , Whelan T , Ma L N , Chen Y F , Wijmans E , Green S , Engel J J , Mur-Artal R , Ren C , Verma S , Clarkson A , Yan M F , Budge B , Yan Y J , Pan X Q , Yon J , Zou Y Y , Leon K , Carter N , Briales J , Gillingham T , Mueggler E , Pesqueira L , Savva M , Batra D , Strasdat H M , De Nardi R , Goesele M , Lovegrove S and Newcombe R . 2019 . The replica dataset: a digital replica of indoor spaces [EB/OL]. [ 2024-03-09 ]. https://arxiv.org/pdf/1906.05797.pdf https://arxiv.org/pdf/1906.05797.pdf
Wan J H , Liu X P , Chen L L , Ao S , Zhang P and Guo Y L . 2024 . Geometric attribute-guided 3D semantic instance reconstruction . Journal of Image and Graphics , 29 ( 1 ): 218 - 230
万骏辉 , 刘心溥 , 陈莉丽 , 敖晟 , 张鹏 , 郭裕兰 . 2024 . 几何属性引导的三维语义实例重建 . 中国图象图形学报 , 29 ( 1 ): 218 - 230 [ DOI: 10.11834/jig.230106 http://dx.doi.org/10.11834/jig.230106 ]
Wang N Y , Zhang Y D , Li Z W , Fu Y W , Liu W and Jiang Y G . 2018 . Pixel2Mesh: generating 3D mesh models from single RGB images // Proceedings of the 15th European Conference on Computer Vision . Munich, Germany : Springer: 55 - 71 [ DOI: 10.1007/978-3-030-01252-6_4 http://dx.doi.org/10.1007/978-3-030-01252-6_4 ]
Wang P , Liu L J , Liu Y , Theobalt C , Komura T and Wang W P . 2021 . NeuS: learning neural implicit surfaces by volume rendering for multi-view reconstruction //Proceedings of the 35th International Conference on Neural Information Processing Systems. [s.l.]: Curran Associates Inc .: #2081
Wang Y J , Fan Q N , Li K , Chen D D , Yang J Y , Lu J Z , Lischinski D and Chen B Q . 2022 . High quality rendered dataset and non-local graph convolutional network for intrinsic image decomposition . Journal of Image and Graphics , 27 ( 2 ): 404 - 420
王玉洁 , 樊庆楠 , 李坤 , 陈冬冬 , 杨敬钰 , 卢健智 , Lischinski D , 陈宝权 . 2022 . 面向本征图像分解的高质量渲染数据集与非局部卷积网络 . 中国图象图形学报 , 27 ( 2 ): 404 - 420 [ DOI: 10.11834/jig.210705 http://dx.doi.org/10.11834/jig.210705 ]
Yariv L , Kasten Y , Moran D , Galun M , Atzmon M , Basri R and Lipman Y . 2020 . Multiview neural surface reconstruction by disentangling geometry and appearance // Proceedings of the 34th International Conference on Neural Information Processing Systems . Vancouver, Canada : Curran Associates Inc.: #210
Ye G Z , Garces E , Liu Y B , Dai Q H and Gutierrez D . 2014 . Intrinsic video and applications . ACM Transactions on Graphics , 33 ( 4 ): # 80 [ DOI: 10.1145/2601097.2601135 http://dx.doi.org/10.1145/2601097.2601135 ]
Ye W C , Chen S , Bao C , Bao H J , Pollefeys M , Cui Z P and Zhang G F . 2023 . IntrinsicNeRF: learning intrinsic neural radiance fields for editable novel view synthesis // Proceedings of 2023 IEEE/CVF International Conference on Computer Vision . Paris, France : IEEE: 339 - 351 [ DOI: 10.1109/iccv51070.2023.00038 http://dx.doi.org/10.1109/iccv51070.2023.00038 ]
Yen-Chen L , Florence P , Barron J T , Rodriguez A , Isola P and Lin T Y . 2021 . iNeRF: inverting neural radiance fields for pose estimation // Proceedings of 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems . Prague, Czech Republic : IEEE: 1323 - 1330 [ DOI: 10.1109/iros51168.2021.9636708 http://dx.doi.org/10.1109/iros51168.2021.9636708 ]
Zhang K , Luan F J , Wang Q Q , Bala K and Snavely N . 2021a . PhySG: inverse rendering with spherical Gaussians for physics-based material editing and relighting // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville, USA : IEEE: 5453 - 5462 [ DOI: 10.1109/cvpr46437.2021.00541 http://dx.doi.org/10.1109/cvpr46437.2021.00541 ]
Zhang X M , Srinivasan P P , Deng B Y , Debevec P , Freeman W T and Barron J T . 2021b . NeRFactor: neural factorization of shape and reflectance under an unknown illumination . ACM Transactions on Graphics , 40 ( 6 ): # 237 [ DOI: 10.1145/3478513.3480496 http://dx.doi.org/10.1145/3478513.3480496 ]
Zhi S F , Laidlow T , Leutenegger S and Davison A J . 2021 . In-place scene labelling and understanding with implicit scene representation // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal, Canada : IEEE: 15838 - 15847 [ DOI: 10.1109/iccv48922.2021.01554 http://dx.doi.org/10.1109/iccv48922.2021.01554 ]
Zhou T H , Krähenbühl P and Efros A A . 2015 . Learning data-driven reflectance priors for intrinsic image decomposition // Proceedings of 2015 IEEE International Conference on Computer Vision . Santiago, Chile : IEEE: 3469 - 3477 [ DOI: 10.1109/iccv.2015.396 http://dx.doi.org/10.1109/iccv.2015.396 ]
相关作者
相关机构