三维风格化人脸生成与结构化建模
Example-based 3D Stylized and Structured Face Modeling
- 2024年 页码:1-14
网络出版日期: 2024-09-18
DOI: 10.11834/jig.240380
移动端阅览
浏览全部资源
扫码关注微信
网络出版日期: 2024-09-18 ,
移动端阅览
胡佳平,周漾.三维风格化人脸生成与结构化建模[J].中国图象图形学报,
Hu Jiaping,Zhou Yang.Example-based 3D Stylized and Structured Face Modeling[J].Journal of Image and Graphics,
现有的三维人脸风格化方法难以生成较大相机姿态的人脸视图,或止步于生成多角度的人脸视图,而非结构化的三维网格模型。本文提出一种基于样例的三维人脸风格化与结构化建模方法。该方法能够不仅能够合成新的全相机视角下的风格化人脸视图,还具有生成结构化的人脸三维网格模型的能力,即包括人脸三维网格以及对应的纹理贴图。具体来说,我们提出了一个两阶段的结构化三维风格人脸生成框架,主要包括三维感知人脸生成器域迁移、基于多视图约束的人脸纹理优化两个步骤。首先,我们利用二维人脸风格化数据增强策略微调三维感知生成器,然后通过一个视图对齐策略对齐基于隐式神经场的渲染视图以及基于三维网格的渲染视图,再利用多视图约束的梯度回传优化人脸模型的纹理贴图,最后通过融合多张纹理贴图得到最终的纹理贴图。结果表明,该方法能够有效构建高质量的结构化三维风格人脸模型,生成高质量的全角度风格人脸视图与纹理贴图。此外,显式构建的结构化人脸模型能够更为便捷地被用于三维人脸相关下游任务。
Facial image stylization and 3D face modeling are important tasks in the fields of computer graphics and vision, with significant applications in virtual reality and social media, including popular technologies such as virtual live streaming, virtual imaging, and digital avatars. This paper addresses the task of 3D facial stylization and generation, aiming to produce novel and stylized facial views from a given real face image and a style reference. The novel views can be rendered at corresponding angles by inputting camera poses in 3D space. Meanwhile, these views need to maintain good 3D multi-view consistency while expressing the exaggerated geometry and colors characteristic of the given artistic style reference.Facial stylization and facial modeling are prominent tasks in the fields of computer graphics and vision, with significant applications in virtual reality and social media. These applications include popular technologies such as virtual live streaming, virtual imaging, and digital humans. This paper addresses the task of 3D facial stylization generation, aiming to produce facial views from corresponding angles by inputting camera poses in 3D space. These views need to maintain good 3D multi-view consistency while expressing the exaggerated geometry and colors characteristic of artistic styles.Existing methods for 3D facial generation can be broadly categorized into two types: those based on 3D deformable models and those based on implicit neural representations. Methods based on 3D deformable models often struggle to express non-facial components such as hairstyles and glasses, which severely limits the quality of the generated results. On the other hand, methods based on implicit neural representations, while capable of achieving good generation results, tend to produce severely distorted facial views under large camera poses, such as side profiles. Additionally, the results of implicit methods typically include only facial geometry and multi-view facial views, making it difficult to integrate them with mature rendering pipelines. This limitation hinders their application in practical scenarios. Consequently, both existing 3D facial generation methods face challenges in producing high-quality 3D stylized facial models with good structured modeling, i.e., 3D facial meshes and topologically complete texture maps.To address the shortcomings of existing methods, this paper proposes a novel approach for 3D stylized facial generation and structured modeling. The goal of this paper is to train a 3D aware stylized facial generator within the style domain of a specified artistic facial sample. This generator should be capable of producing high-quality 3D facial views from any angle in the specified style, including large-pose side profiles and back views. Furthermore, based on multi-view facial data, the generator should produce structured 3D facial models, including facial geometric mesh models and corresponding texture maps.To achieve this, the paper proposes a two-stage method for 3D stylized facial generation and structured modeling. The method comprises two main steps: 3D aware facial generator domain transfer and multi-view constrained facial texture optimization. In the first stage, the paper utilizes 2D facial stylization prior methods to perform data augmentation on artistic style samples, generating a small-scale artistic style facial dataset. Subsequently, the camera poses and facial masks of the facial images in this dataset are extracted sequentially. The annotated stylized facial dataset is then used to fine-tune a 3D aware generator in the natural style domain. The fine-tuned 3D aware generator can generate high-quality multi-view facial views and 3D mesh models.The focus of the second stage is to optimize facial textures using multi-view images from a set of directions. The paper first performs smoothing and UV unwrapping on the facial mesh. To align the volumetric rendered facial views with the differentiable rendered facial views for pixel-level loss optimization of facial textures, the paper proposes a simple and effective facial view alignment strategy based on mask affine transformation. Finally, multi-view facial supervision is used to optimize facial textures, and the final facial texture map is obtained through texture fusion.To demonstrate the superiority of the proposed method, the paper compares the two-stage 3D facial generation and structured modeling method with existing advanced baseline methods. This comparison illustrates the quality of 3D aware stylized facial generation and the effectiveness of structured facial mesh generation. Additionally, to demonstrate the effectiveness of each stage component of the proposed method, the paper includes ablation studies for the key components of the method. The use studies illustrate the correctness of the proposed method.Qualitative and quantitative experiments show that the proposed method can effectively construct high-quality structured 3D stylized facial models and generate high-quality stylized facial views. Moreover, the explicitly modeled structured facial models can be more conveniently applied to downstream tasks related to 3D faces. The results indicate that the proposed method not only achieves superior performance in generating stylized facial views but also ensures the structural integrity and applicability of the generated 3D facial models in practical scenarios.In conclusion, this paper presents a comprehensive approach to 3D stylized facial generation and structured modeling, addressing the limitations of existing methods. By leveraging a two-stage process that includes domain transfer and multi-view constrained texture optimization, the proposed method achieves high-quality results in both facial view generation and structured modeling. The effectiveness of the method is demonstrated through extensive experiments, highlighting its potential for practical applications in virtual reality, social media, and other related fields.
视觉内容生成三维风格人脸生成三维感知生成域迁移纹理优化
visual content synthesis3D portrait stylization3D aware generationdomain adaptationtexture optimization
Abdal R, Lee H Y, Zhu P, et al. 3DAvatarGAN: Bridging Domains for Personalized Editable Avatars. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)[C]. Vancouver, BC, Canada: IEEE, 2023: 4552-4562. DOI: 10.48550/arXiv.2301.02700http://dx.doi.org/10.48550/arXiv.2301.02700
An S, Xu H, Shi Y, et al. PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360deg. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition[C]. 2023: 20950-20959. DOI: 10.48550/arXiv.2303.13071http://dx.doi.org/10.48550/arXiv.2303.13071
Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481-2495. DOI: 10.48550/arXiv.1511.00561http://dx.doi.org/10.48550/arXiv.1511.00561
BLANA V. A morphable model for the synthesis of 3d faces. SIGGRAPH’99 Conference Proceedings[C]. 1999: 187-194. DOI: 10.1145/311535.311556http://dx.doi.org/10.1145/311535.311556
Cao C, Weng Y, Zhou S, et al. Facewarehouse: A 3d facial expression database for visual computing[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 20(3): 413-425. DOI: 10.1109/TVCG.2013.249http://dx.doi.org/10.1109/TVCG.2013.249
Chan E R, Lin C Z, Chan M A, et al. Efficient geometry-aware 3D generative adversarial networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition[C]. 2022: 16123-16133. DOI: 10.48550/arXiv.2204.11884http://dx.doi.org/10.48550/arXiv.2204.11884
Chong M J, Forsyth D. JoJoGAN: One Shot Face Stylization[M]. Avidan S, Brostow G, Cissé M, et al. Computer Vision – ECCV 2022. Cham: Springer Nature Switzerland, 2022: 128-152. DOI: 10.1007/978-3-031-20065-1_8http://dx.doi.org/10.1007/978-3-031-20065-1_8
Guo J, Zhu X, Yang Y, et al. Towards fast, accurate and stable 3d dense face alignment. European Conference on Computer Vision[C]. Springer, 2020: 152-168. DOI: 10.1007/978-3-030-58592-1_9http://dx.doi.org/10.1007/978-3-030-58592-1_9
Han F, Ye S, He M, et al. Exemplar-based 3d portrait stylization[J]. IEEE Transactions on Visualization and Computer Graphics, 2021, 29(2): 1371-1383. DOI: 10.1109/TVCG.2021.3114804http://dx.doi.org/10.1109/TVCG.2021.3114804
Heusel M, Ramsauer H, Unterthiner T, et al. Gans trained by a two time-scale update rule converge to a local nash equilibrium[J]. Advances in neural information processing systems, 2017, 30. DOI: 10.48550/arXiv.1706.08500http://dx.doi.org/10.48550/arXiv.1706.08500
Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in neural information processing systems, 2020, 33: 6840-6851. DOI: 10.48550/arXiv.2006.11239http://dx.doi.org/10.48550/arXiv.2006.11239
Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 4401-4410. DOI: 10.48550/arXiv.1812.04948http://dx.doi.org/10.48550/arXiv.1812.04948
Karras T, Laine S, Aittala M, et al. Analyzing and improving the image quality of stylegan. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition[C]. 2020: 8110-8119. DOI: 10.48550/arXiv.1912.04958http://dx.doi.org/10.48550/arXiv.1912.04958
Kato H, Ushiku Y, Harada T. Neural 3d mesh renderer. Proceedings of the IEEE conference on computer vision and pattern recognition[C]. 2018: 3907-3916. DOI: 10.48550/arXiv.1711.07566http://dx.doi.org/10.48550/arXiv.1711.07566
Lorensen W E, Cline H E. Marching cubes: A high resolution 3D surface construction algorithm[M]. Seminal graphics: pioneering efforts that shaped the field. 1998: 347-353. DOI: 10.1145/37402.37422http://dx.doi.org/10.1145/37402.37422
Mildenhall B, Srinivasan P P, Tancik M, et al. Nerf: Representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2021, 65(1): 99-106. DOI: 10.1145/3453476http://dx.doi.org/10.1145/3453476
Oleynikova H, Millane A, Taylor Z, et al. Signed distance fields: A natural representation for both mapping and planning. RSS 2016 workshop: geometry and beyond-representations, physics, and scene understanding for robotics[C]. University of Michigan, 2016. DOI: 10.48550/arXiv.1607.02533http://dx.doi.org/10.48550/arXiv.1607.02533
Or-El R, Luo X, Shan M, et al. Stylesdf: High-resolution 3d-consistent image and geometry generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 13503-13513. DOI: 10.48550/arXiv.2204.11884http://dx.doi.org/10.48550/arXiv.2204.11884
Park J J, Florence P, Straub J, et al. Deepsdf: Learning continuous signed distance functions for shape representation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 165-174. DOI: 10.48550/arXiv.1901.05103http://dx.doi.org/10.48550/arXiv.1901.05103
Richardson E, Alaluf Y, Patashnik O, et al. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)[C]. Nashville, TN, USA: IEEE, 2021: 2287-2296. DOI: 10.48550/arXiv.2008.00951http://dx.doi.org/10.48550/arXiv.2008.00951
Roich D, Mokady R, Bermano A H, et al. Pivotal Tuning for Latent-based Editing of Real Images[J]. ACM Transactions on Graphics, 2023, 42(1): 1-13. DOI: 10.1145/3544777http://dx.doi.org/10.1145/3544777
Ruiz N, Li Y, Jampani V, et al. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition[C]. 2023: 22500-22510. DOI: 10.48550/arXiv.2208.12242http://dx.doi.org/10.48550/arXiv.2208.12242
Salimans T, Goodfellow I, Zaremba W, et al. Improved techniques for training gans[J]. Advances in neural information processing systems, 2016, 29. DOI: 10.48550/arXiv.1606.03498http://dx.doi.org/10.48550/arXiv.1606.03498
Song G, Luo L, Liu J, et al. AgileGAN: stylizing portraits by inversion-consistent transfer learning[J]. ACM Transactions on Graphics, 2021, 40(4): 1-13. DOI: 10.1145/3450626.3459771http://dx.doi.org/10.1145/3450626.3459771
Song G, Xu H, Liu J, et al. AgileGAN3D: Few-Shot 3D Portrait Stylization by Augmented Transfer Learning[J]. arXiv preprint arXiv:2303.14297, 2023. DOI: 10.48550/arXiv.2303.14297http://dx.doi.org/10.48550/arXiv.2303.14297
Su H, Niu J, Liu X, et al. Mangagan: Unpaired photo-to-manga translation based on the methodology of manga drawing[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 35(3): 2611-2619. DOI: 10.1609/aaai.v35i3.16335http://dx.doi.org/10.1609/aaai.v35i3.16335
Sun J, Wang X, Shi Y, et al. Ide-3d: Interactive disentangled editing for high-resolution 3d-aware portrait synthesis[J]. ACM Transactions on Graphics (ToG), 2022, 41(6): 1-10. DOI: 10.1145/3550454.3555477http://dx.doi.org/10.1145/3550454.3555477
Sun J, Wang X, Zhang Y, et al. Fenerf: Face editing in neural radiance fields[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 7672-7682. DOI: 10.1109/CVPR52688.2022.00752http://dx.doi.org/10.1109/CVPR52688.2022.00752
Wang T, Zhang B, Zhang T, et al. RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)[C]. Vancouver, BC, Canada: IEEE, 2023: 4563-4573. DOI: 10.1109/CVPRW53098.2023.00123http://dx.doi.org/10.1109/CVPRW53098.2023.00123
Wu Q, Zhang J, Lai Y K, et al. Alive Caricature from 2D to 3D. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition[C]. Salt Lake City, UT: IEEE, 2018: 7336-7345. DOI: 10.1109/CVPR.2018.00766http://dx.doi.org/10.1109/CVPR.2018.00766
Xiang J, Zhu G. Joint face detection and facial expression recognition with MTCNN. 2017 4th international conference on information science and control engineering (ICISCE)[C]. IEEE, 2017: 424-427. DOI: 10.1109/ICISCE.2017.103http://dx.doi.org/10.1109/ICISCE.2017.103
Xue Y, Li Y, Singh K K, et al. Giraffe hd: A high-resolution 3d-aware generative model. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition[C]. 2022: 18440-18449. DOI: 10.1109/CVPR52688.2022.01800http://dx.doi.org/10.1109/CVPR52688.2022.01800
Yang S, Jiang L, Liu Z, et al. Pastiche master: Exemplar-based high-resolution portrait style transfer. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition[C]. 2022: 7693-7702. DOI: 10.1109/CVPR52688.2022.00753http://dx.doi.org/10.1109/CVPR52688.2022.00753
Ye Z, Xia M, Sun Y, et al. 3D-CariGAN: An End-to-End Solution to 3D Caricature Generation From Normal Face Photos[J]. IEEE Transactions on Visualization and Computer Graphics, 2023, 29(4): 2203-2210. DOI: 10.1109/TVCG.2022.3209426http://dx.doi.org/10.1109/TVCG.2022.3209426
Zhang R, Isola P, Efros A A, et al. The unreasonable effectiveness of deep features as a perceptual metric. Proceedings of the IEEE conference on computer vision and pattern recognition[C]. 2018: 586-595. DOI: 10.1109/CVPR.2018.00068http://dx.doi.org/10.1109/CVPR.2018.00068
Zheng M, Yang H, Huang D, et al. ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)[C]. New Orleans, LA, USA: IEEE, 2022: 20311-20320. DOI: 10.1109/CVPR52688.2022.01978http://dx.doi.org/10.1109/CVPR52688.2022.01978
相关文章
相关作者
相关机构