融合局部空间信息的新视角合成方法
A Novel-view Synthesis Method Integrating Local Spatial Information
- 2025年 页码:1-15
收稿日期:2024-11-04,
修回日期:2025-02-07,
录用日期:2025-02-25,
网络出版日期:2025-02-26
DOI: 10.11834/jig.240673
移动端阅览
浏览全部资源
扫码关注微信
收稿日期:2024-11-04,
修回日期:2025-02-07,
录用日期:2025-02-25,
网络出版日期:2025-02-26,
移动端阅览
目的
2
基于点云的神经渲染方法受点云质量及特征提取的影响,易导致新视角合成图像渲染质量下降,为此本文提出一种融合局部空间信息的新视角合成方法。
方法
2
针对点云质量及提取特征不足的问题,本文首先给出一种神经点云特征对齐模块,将点云与图像匹配区域的特征进行对齐,融合后构成神经点云,提升其特征的局部表达能力。其次,提出一种神经点云Transformer模块,用于融合局部神经点云的上下文信息,在点云质量不佳的情况下仍能提取可靠的局部空间信息,有效增强了点云神经渲染方法的合成质量。
结果
2
实验结果表明,在真实场景数据集中,对于只包含单一物品的数据集Tanks and Temples,本文方法在峰值信噪比(peak signal to noise ratio, PSNR)指标上与NeRF方法相比提高19.2%,相较于使用点云输入的方法Tetra-NeRF和Point-NeRF分别提升6.4%、3.8%,即使在场景更为复杂的ScanNet数据集中,与NeRF方法及Point-NeRF相比分别提升34.6%和2.1%。
结论
2
本文方法能够更好地利用点云的局部空间信息,有效改善了稀疏视角图像输入下因点云质量和提取特征导致的渲染质量下降,实验结果验证了本文方法的有效性。
Objective
2
Modeling real-world scenes from image data and generating photorealistic novel views pose significant challenges within the fields of computer vision and graphics. NeRF and its extensions have emerged as highly successful approaches for addressing this challenge by leveraging neural radiance fields. However, these methods frequently reconstruct radiance fields across the entire space using global MLPs through ray marching, resulting in prolonged reconstruction times. This delay is primarily attributed to the slow fitting of per-scene networks and the excessive sampling of extensive empty spaces. To solve these problems, point cloud-based neural radiance fields representation is proposed, which uses 3D point to model the scene. Unlike NeRF, which relies purely on per-scene fitting, the point cloud-based neural radiance fields method can be efficiently initialized by a feed-forward deep neural network pre-trained across scenes. Furthermore, ray sampling in an empty scene space is avoided by utilizing a classical point cloud that approximates the actual scene geometry. However, the point cloud-based neural radiance fields is affected by the quality of the point cloud, and the influence of extracted image features leads to a decrease in the rendering quality of new perspective images. To this end, A novel-view synthesis method integrating local spatial information is proposed from the two key points of aligning point cloud features and fusing local point cloud context information field.
Method
2
Neural point cloud generation networks and point cloud neural radiance fields comprise the network architecture of this paper. The point cloud and confidence are produced by the depth prediction network in the neural point cloud generation network component. The image is processed using the feature pyramid network to extract features at different scales. The neural point cloud feature alignment module is subsequently employed to integrate the feature derived from both the point cloud and the image. The features are aligned to extract the semantic information of the image, enabling the network to more effectively adjust to the structural and textural characteristics of various scene images. Neural point clouds are created by mixing points, confidence levels, and image features. In neural radiation field network structure based on neural point cloud, the RGB value and volume density of the sampling point are predicted by aggregating the neural point cloud features near the sampling point. This experiment utilize the Transformer layer combines the contextual information of the local neural point cloud to better capture the spatial and geometric shape details, and outputs high-quality synthetic images through volume rendering.
Result
2
To assure the reliability of the training and testing procedure, this experiment establish the environment on the Ubuntu18.04 system. The CPU is an Intel Core i9-10900, the memory capacity is 32GB, and the graphics card is an RTX 3090. The experiment primarily use the peak signal-to-noise ratio (PSNR) as the metric for evaluating the test results. Additionally, it utilizes the structural similarity index measure (SSIM) and the learnt perceptual image patch similarity (LPIPS). Network training uses the adaptive learning rate optimization Adam algorithm. By dynamically adjusting the learning rate, the net-work can more effectively balance the convergence speed and stability during the training process. The initial learning rate is set to 0.0005, and the decay rate parameters are set to 0.9 and 0.99. Four widely utilized datasets—DTU, NeRF Synthetic, Tanks and Temples, and ScanNet—will be utilized in the experiment. DTU is a data set of indoor scenes. Each scene comprises 49 object photography angles, with 7 brightness levels per angle, and has a resolution of 512 by 640 pixels. NeRF Synthetic is a synthetic data set containing eight scenes, each with 100 training images and 200 test images, which are fully rendered and synthesized by Blender. ScanNet is an interior scanning data set. Scenes 241 and 101 will be applied to the evaluation. And 20% of the total image count will be allocated for training objectives (1,463 images for Scene-241 and 1000 images for Scene 101), with the remaining images being utilized for evaluation purposes. Tanks and Temples is an extensive collection of indoor scene data, comprising 14 distinct scenes. Experimental results show that in real scene datasets, for the Tanks and Temples datasets containing only a single object, the peak signal to noise ratio (PSNR) of this method is improved by 19.2% compared with the NeRF method, and by 6.4% and 3.8% respectively compared with the methods Tetra-NeRF and Point-NeRF using point cloud input. Even in the ScanNet dataset with more complex scenes, it is improved by 34.6% and 2.1% respectively compared with the NeRF method and Point-NeRF.
Conclusion
2
This paper presents a novel-view synthesis method integrating local spatial information that, in conjunction with an neural point cloud feature alignment module, dynamically modifies the alignment characteristics of a neural point cloud. When the points correspond to aligned features, our approach can enhance the precision of this procedure through the extraction of features from images at various dimensions along with the semantic information they encompass. The neural point cloud transformer module enhances the network's ability to extract spatial position and geometry information from the neural point cloud by incorporating context information from nearby sampling points. This improved efficiency is particularly useful when dealing with points of different qualities and shapes. The experimental results for the Tanks and Temples, Synthetic Blender, and ScanNet datasets show that this method outperforms existing advanced point cloud-based neural radiation fields in terms of visual effects and assessment indicators. Simply put, the method out-lined in this document improves the combination of point cloud and image characteristics. It utilizes the contextual information found in local point cloud features to assist the network in merging sparse point cloud features. This leads to more lifelike and unique details in the resulting image. Additionally, it produces high-quality scene images from input images that only contain a small number of shots.
Aliev K A , Sevastopolsky A , Kolos M , Ulyanov D and Lempitsky V . 2020 . Neural point-based graphics // Proceedings of the 16th European Conference . Glasgow, UK : Springer: 696 - 712 [ DOI: 10.1007/978-3-030-58542-6_42 http://dx.doi.org/10.1007/978-3-030-58542-6_42 ]
Cheng S , Xu Z , Zhu S , Li Z , Li L E , Ramamoorthi R and Su H . 2020 . Deep Stereo using Adaptive Thin Volume Representation with Uncertainty Awareness // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Seattle, WA, USA : IEEE: 2521 - 2531 [ DOI: 10.1109/CVPR42600.2020.00260 http://dx.doi.org/10.1109/CVPR42600.2020.00260 ]
Chen A , Xu Z , Zhao F , Zhang X , Xiang F , Yu J and Su H . 2021 . MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo // Proceedings of 2021 IEEE International Conference on Computer Vision (ICCV) . Montreal, QC, Canada : IEEE: 14104 - 14113 [ DOI: 10.1109/ICCV48922.2021.01386 http://dx.doi.org/10.1109/ICCV48922.2021.01386 ]
Chung J , Oh J , and Lee K . M . 2024 . Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot Images // Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . Seattle, WA, USA : IEEE: 811 – 820 [ DOI: 10.1109/cvprw63382.2024.00086 http://dx.doi.org/10.1109/cvprw63382.2024.00086 ]
Dai J , Qi H , Xiong Y , Li Y , Zhang G , Hu H and Wei Y . 2017 . Deformable Convolutional Networks // Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV) . Venice, Italy : IEEE: 764 - 773 [ DOI: 10.1109/ICCV.2017.89 http://dx.doi.org/10.1109/ICCV.2017.89 ]
Fridovich-Keil S , Yu A , Tancik M , Chen Q , Recht B and Kanazawa A . 2022 . Plenoxels: Radiance Fields without Neural Networks // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . New Orleans, LA, USA : IEEE: 5491 - 5500 [ DOI: 10.1109/CVPR52688.2022.00542 http://dx.doi.org/10.1109/CVPR52688.2022.00542 ]
Govindarajan S , Sambugaro Z , Shabanov A , Takikawa T , Rebain D , Sun W , Conci N , Yi K M , and Tagliasacchi A . 2024 . Lagrangian Hashing for Compressed Neural Field Representations // Proceedings of the 18th European Conference . Milan, Italy : Springer-Verlag: 183 – 199 [ DOI: 10.1007/978-3-031-73383-3_11 http://dx.doi.org/10.1007/978-3-031-73383-3_11 ]
Jain A , Tancik M and Abbeel P . 2021 . Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis // Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV) . Montreal, QC, Canada : IEEE: 5865 - 5874 [ DOI: 10.1109/ICCV48922.2021.00583 http://dx.doi.org/10.1109/ICCV48922.2021.00583 ]
Kerbl B , Kopanas G , Leimkühler T and Drettakis G . 2023 . 3d gaussian splatting for real-time radiance field rendering . Journal of ACM Transactions on Graphics (TOG) , 42 ( 4 ): 139 - 1 [ DOI: 10.1145/3592433 http://dx.doi.org/10.1145/3592433 ]
Kulhanek J and Sattler T . 2023 . Tetra-NeRF: Representing Neural Radiance Fields Using Tetrahedra // Proceedings of 2023 IEEE International Conference on Computer Vision (ICCV) . Paris, France : IEEE: 18412 - 18423 [ DOI: 10.1109/ICCV51070.2023.01692 http://dx.doi.org/10.1109/ICCV51070.2023.01692 ]
Kajiya J T and Von Herzen B P 1984 . Ray tracing volume densities . Journal of ACM SIGGRAPH computer graphics , 18 ( 3 ): 165 – 174 [ DOI: 10.1145/964965.80859 http://dx.doi.org/10.1145/964965.80859 ]
Liu X N , Chen L Y , Hu X J and Yu H Y . 2024 . Virtual viewpoint image synthesis using neural radiance fields with depth information supervision . Journal of Image and Graphics , 29 ( 07 ): 2035 - 2045
刘晓楠 , 陈纯毅 , 胡小娟 , 于海洋 . 2024 . 带深度信息监督的神经辐射场虚拟视点画面合成 . 中国图象图形学报 , 29 ( 07 ): 2035 - 2045 [ DOI: 10.11834/jig.221188 http://dx.doi.org/10.11834/jig.221188 ]
Lin T Y , Dollar P , Girshick R , He K , Hariharan B and Belongie S . 2017 . Feature Pyramid Networks for Object Detection // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Honolulu, HI, USA : IEEE: 936 - 944 [ DOI: 10.1109/CVPR.2017.106 http://dx.doi.org/10.1109/CVPR.2017.106 ]
Lombardi S , Simon T , Saragih J , Schwartz G , Lehrmann A and Sheikh Y . 2019 . Neural volumes: learning dynamic renderable volumes from images . Journal of ACM Transactions on Graphics (TOG) , 38 ( 4 ): 1 - 14 [ DOI: 10.1145/3306346.3323020 http://dx.doi.org/10.1145/3306346.3323020 ]
Liu L , Gu J , Zaw Lin K , Chua T S and Theobalt C . 2020 . Neural sparse voxel fields // Proceedings of the 31st ACM International Conference on Multimedia (MM '23) . New York, USA : Association for Computing Machinery: 15651 – 15663 [ DOI: 10.1145/3581783.3611957 http://dx.doi.org/10.1145/3581783.3611957 ]
Mildenhall B , Srinivasan P , Tancik M , Barron J T , Ramamoorthi R and Ng R . 2020 . NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis // Proceedings of 16th European Conference . Glasgow, UK : Springer-Verlag: 405 - 421 [ DOI: 10.1007/978-3-030-58452-8 http://dx.doi.org/10.1007/978-3-030-58452-8 ]
Müller T , Evans A , Schied C and Keller A . 2022 . Instant neural graphics primitives with a multiresolution hash encoding . Journal of ACM Transactions on Graphics , 41 ( 4 ): 1 – 15 [ DOI: 10.1145/3528223.3530127 http://dx.doi.org/10.1145/3528223.3530127 ]
Rakhimov R , Ardelean A T , Lempitsky V and Burnaev E . 2022 . NPBG++: Accelerating Neural Point-Based Graphics // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . New Orleans, LA, USA : IEEE: 15948 - 15958 [ DOI: 10.1109/CVPR52688.2022.01550 http://dx.doi.org/10.1109/CVPR52688.2022.01550 ]
Verbin D , Hedman P , Mildenhall B , Zickler T , Barron J T and Srinivasan P P . 2022 . Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . New Orleans, LA, USA : IEEE: 5481 - 5490 [ DOI: 10.1109/CVPR52688.2022.00541 http://dx.doi.org/10.1109/CVPR52688.2022.00541 ]
Vaswani A , Shazeer N , Parmar N , Uszkoreit J , Jones L , Gomez A N , Kaiser L and Polosukhin L . 2017 . Attention is all you need // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach, USA : NIPS: 6000 - 6010
Wang Z , Yang W , Cao J , Hu Q , Xu L , Yu J and Yu J . 2023 . NeReF: Neural Refractive Field for Fluid Surface Reconstruction and Rendering // Proceedings of 2023 IEEE International Conference on Computational Photography (ICCP) . Madison, WI, USA : IEEE: 1 - 11 [ DOI: 10.1109/ICCP56744.2023.10233838 http://dx.doi.org/10.1109/ICCP56744.2023.10233838 ]
Xiao Q , Chen M L , Zhang H and Huang X H . 2024 . Neural radiance field reconstruction for sparse indoor panoramas . Journal of Image and Graphics , 29 ( 09 ): 2596 - 2609
肖强 , 陈铭林 , 张晔 , 黄小红 . 2024 . 室内稀疏全景图的神经辐射场重建 . 中国图象图形学报 , 29 ( 09 ): 2596 - 2609 [ DOI: 10.11834/jig.230643 http://dx.doi.org/10.11834/jig.230643 ]
Xu D , Jiang Y , Wang P , Fan Z , Shi H and Wang Z . 2022 . Sinnerf: Training neural radiance fields on complex scenes from a single image // Proceedings of the 17th European Conference . Tel Aviv, Israel : Springer: 736 - 753 [ DOI: 10.1007/978-3-031-20047-2_42 http://dx.doi.org/10.1007/978-3-031-20047-2_42 ]
Xu Q , Xu Z , Philip J , Bi S , Shu Z , Sunkavalli K and Neumann U . 2022 . Point-NeRF: Point-based Neural Radiance Fields // Proceedings of 2022 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . New Orleans, LA, USA : IEEE: 5428 - 5438 [ DOI: 10.1109/CVPR52688.2022.00536 http://dx.doi.org/10.1109/CVPR52688.2022.00536 ]
Yu A , Li R , Tancik M , Li H , Ng R and Kanazawa A . 2021 . PlenOctrees for Real-time Rendering of Neural Radiance Fields // Proceedings of 2021 IEEE International Conference on Computer Vision (ICCV) . Montreal, QC, Canada : IEEE: 5732 - 5741 [ DOI: 10.1109/ICCV48922.2021.00570 http://dx.doi.org/10.1109/ICCV48922.2021.00570 ]
Yao Y , Luo Z X , Li S W , Fang T and Quan L . 2018 . MVSNet: depth inference for unstructured multi-view stereo // Proceedings of the 15th European Conference . Munich, Germany : Springer: 785 - 801 [ DOI: 10.1007/978-3-030-01237-3_47 http://dx.doi.org/10.1007/978-3-030-01237-3_47 ]
Yu A , Ye V , Tancik M and Kanazawa A . 2021 . pixelNeRF: Neural Radiance Fields from One or Few Images // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Nashville, TN, USA : IEEE: 4576 - 4585 [ DO I: 10.1109/CVPR46437.2021.00455 http://dx.doi.org/10.1109/CVPR46437.2021.00455 ]
Zhu X , Hu H , Lin S and Dai Jet . 2019 . Deformable ConvNets v2: More Deformable, Better Results // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Long Beach, CA, USA : IEEE: 9300 - 9308 [ DOI: 10.1109/cvpr.2019.00953 http://dx.doi.org/10.1109/cvpr.2019.00953 ]
Zhang Y , Huang X , Ni B , Zhang W and Li Ti . 2023 . Frequency-Modulated Point Cloud Rendering with Easy Editing // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Vancouver, BC, Canada : IEEE: 119 - 129 [ DOI: 10.1109/CVPR52729.2023.00020 http://dx.doi.org/10.1109/CVPR52729.2023.00020 ]
相关作者
相关机构