Face age synthesis fusing channel-coordinate attention mechanism and parallel dilated convolution

Zhang Ke; Yu Tingting; Shi Chaojun; Lou Wenshuo; Liu Yang

doi:10.11834/jig.230007

Image Understanding and Computer Vision | Views : 0 下载量: 3 CSCD: 0

PDF
Export
Share
Collection
Album

Face age synthesis fusing channel-coordinate attention mechanism and parallel dilated convolution
Vol. 28, Issue 12, Pages: 3870-3883(2023)
Published： 16 December 2023 ，
DOI： 10.11834/jig.230007
稿件说明：

移动端阅览

张珂，于婷婷，石超君，娄文硕，刘阳. 2023. 融合通道位置注意力机制和并行空洞卷积的人脸年龄合成. 中国图象图形学报， 28(12):3870-3883

Zhang Ke， Yu Tingting， Shi Chaojun， Lou Wenshuo， Liu Yang. 2023. Face age synthesis fusing channel-coordinate attention mechanism and parallel dilated convolution. Journal of Image and Graphics， 28(12):3870-3883
张珂，于婷婷，石超君，娄文硕，刘阳. 2023. 融合通道位置注意力机制和并行空洞卷积的人脸年龄合成. 中国图象图形学报， 28(12):3870-3883 DOI： 10.11834/jig.230007.

Zhang Ke， Yu Tingting， Shi Chaojun， Lou Wenshuo， Liu Yang. 2023. Face age synthesis fusing channel-coordinate attention mechanism and parallel dilated convolution. Journal of Image and Graphics， 28(12):3870-3883 DOI： 10.11834/jig.230007.

摘要

目的

人脸年龄合成旨在合成指定年龄人脸图像的同时保持高可信度的人脸，是计算机视觉领域的热门研究方向之一。然而目前主流人脸年龄合成模型过于关注纹理信息，忽视了与人脸相关的多尺度特征，此外网络存在对身份信息筛选不佳的问题。针对以上问题，提出一种融合通道位置注意力机制和并行空洞卷积的人脸年龄合成网络（generative adversarial network（GAN）composed of the parallel dilated convolution and channel-coordinate attention mechanism，PDA-GAN）。

方法

PDA-GAN 基于生成对抗网络提出了并行三通道空洞卷积残差块和通道—位置注意力机制。并行三通道空洞卷积残差块将3种膨胀系数空洞卷积提取的不同尺度人脸特征融合，提升了特征尺度上的多样性和总量上的丰富度；通道—位置注意力机制通过对人脸特征的长度、宽度和深度显著性计算，定位图像中与年龄高度相关的通道和空间位置区域，增强了网络对通道和空间位置上敏感特征的表达能力，解决了特征冗余问题。

结果

实验在Flickr高清人脸数据集（Flickr-faces-high-quality，FFHQ）上训练，在名人人脸属性高清数据集（large-scale celebfaces attributes dataset-high quality， Celeba-HQ）上测试，将本文提出的 PDA-GAN 与最新的3种人脸年龄图像合成网络进行定性和定量比较，以验证本文方法的有效性。实验结果表明，PDA-GAN 显著提升了人脸年龄合成的身份置信度和年龄估计准确度，具有良好的身份信息保留和年龄操控能力。

结论

本文方法能够合成具有较高真实度和准确性的目标年龄人脸图像。

Abstract

Objective

Face age synthesis is one of the most popular research fields in computer vision aiming at synthesizing face images of specified ages while maintaining high fidelity. With the continuous progress of science and technology， face age synthesis technology is being gradually applied in face recognition， film special effects， public security， and other fields with a very wide range of application scenarios. The generative adversarial network （GAN） is one of the most widely used deep learning models in face synthesis. The generator and discriminator of GAN fight each other to generate images that are real enough to be fake. While GAN and its variant models have achieved good synthesis results， some deficiencies remain unaddressed. First， in order to synthesize images that are close to the target age， the current face age synthesis models only limit the process of age change to texture information and ignore multi-scale features， such as contour， hair color， and texture， on the face. Second， the limited receptive field of the convolutional layer hinders the full convolutional network from extracting multi-scale features in the image. These problems greatly restrict the face age image synthesis effect of GAN. To solve these problems， this paper proposes a GAN composed of the parallel dilated convolution and channel-coordinate attention mechanism （PDA-GAN）.

Method

PDA-GAN proposes a parallel three-channel dilated convolutional residual block （PTDCRB） and a channel-coordinate attention mechanism （CCAM） based on generative adversarial networks. PTDCRB is introduced in the generator network of the baseline. Each PTDCRB comprises three parallel dilated convolution channels that extract features at the same time. The dilated convolutions on different branches set expansion coefficients of ［1， 2， 3］， respectively. Each branch of PTDCRB shares weights and reduces the amount of network parameters. The first layer of each branch in PTDCRB uses a 1 × 1 convolutional layer， the second layer is a dilated convolution with different expansion coefficients， and the third layer uses a 1 × 1 convolutional layer to reduce dimensionality and improve computational efficiency. Meanwhile， CCAM significantly screens the channel dimension of the feature vector， retains meaningful channel information in the feature， and learns the importance of different channels in order to avoid feature redundancy. CCAM then embeds the position information into the feature vector after channel attention and fuses them together after calculating the attention mechanism along the two orthogonal directions of length and width. The purpose of CCAM is to easily capture the dependencies of features at different positions.

Result

An experiment is conducted on the FFHQ dataset， samples in the Celeba-HQ dataset are selected as the test set， and PDA-GAN is qualitatively and quantitatively compared with the three latest face age image synthesis networks HRFAE， LIFE， and SAM to verify its effectiveness. Age accuracy and identity consistency are adopted as quantitative indicators. PDA-GAN achieves the best accuracy for synthetic age images， with an average prediction difference of 4.09. The identity confidence can reach 99.2% when synthesizing a 30-year-old face. In the age-independent attribute retention experiment， PDA-GAN outperforms the other models in both quantitative indicators， with a gender retention rate of 99.7% and emotion retention rate of 93.2%. An ablation experiment is performed to further prove the effectiveness of each module of PDA-GAN， where PTDCRB is introduced into different layers of the generator backbone network. Experimental results show that PTDCRB-3 significantly improves identity confidence and age estimation accuracy. Four PTDCRB expansion coefficient sets are then established to train the network， and an expansion coefficient of ［1， 2， 3］ needs to be achieved to confirm the optimality of model identity confidence and predicted age distribution. The standard generator structure and the generator structure introducing the channel-coordinate attention mechanism are then tested for their performance on age synthesis accuracy and identity verification confidence. Experimental results show that the identity retention and age synthesis abilities are significantly improved after adding the channel-coordinate attention mechanism.

Conclusion

This study proposes a parallel three-channel dilated convolution residual block with shared weights that captures feature information at each scale and enhances the richness of the model detail features. To enhance the expressiveness of the model on sensitive features， this paper proposes a channel-coordinate attention mechanism that learns features of the channel and spatial dimensions simultaneously. Under the combined effect of the parallel three-channel dilated convolution residual block and the channel-position attention mechanism， the identity preservation ability and age synthesis accuracy of the model for face images are improved. Experimental results show that the proposed method outperforms other popular methods for face age synthesis tasks and can synthesize natural and realistic face images of the target age with high fidelity and accuracy.

关键词

图像合成人脸年龄生成对抗网络（GAN）空洞卷积注意力机制

Keywords

image synthesisface agegenerative adversarial network （GAN）dilated convolutionattention mechanism

references

Alaluf Y， Patashnik O and Cohen-Or D. 2021. Only a matter of style： age transformation using a style-based regression model. ACM Transactions on Graphics， 40（4）： #45 ［DOI： 10.1145/3450626.3459805http://dx.doi.org/10.1145/3450626.3459805］

Deng J K， Guo J， Xue N N and Zafeiriou S. 2019. ArcFace： additive angular margin loss for deep face recognition//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 4685-4694 ［DOI： 10.1109/CVPR.2019.00482http://dx.doi.org/10.1109/CVPR.2019.00482］

Feng S and Gao S J. 2022. Research on the prospect of face aging technology based on generative adversity-network in public security field. Electronic Test， 36（2）： 61-65

封顺，高胜极. 2022. 基于生成对抗网络的人脸老化技术在公安领域前景探究. 电子测试， 36（2）： 61-65 ［DOI： 10.16520/j.cnki.1000-8519.2022.02.041http://dx.doi.org/10.16520/j.cnki.1000-8519.2022.02.041］

Fu Y， Guo G D and Huang T S. 2010. Age synthesis and estimation via faces： a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence， 32（11）： 1955-1976 ［DOI： 10.1109/TPAMI.2010.36http://dx.doi.org/10.1109/TPAMI.2010.36］

He Z， Kan M Shan S and Chen X. 2019. S2gan： Share aging factors across ages and share aging trends among individuals//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision（ICCV）. Seoul， Korea （South）. IEEE： 9440-9449 ［DOI：10.1109/ICCV.2019.00953http://dx.doi.org/10.1109/ICCV.2019.00953］

Jeon S， Lee P， Hong K and Byun H. 2021. Continuous face aging generative adversarial networks//Proceedings of 2021 IEEE International Conference on Acoustics， Speech and Signal Processing （ICASSP）. Toronto， Canada： IEEE： 1995-1999 ［DOI： 10.1109/ICASSP39728.2021.9414429http://dx.doi.org/10.1109/ICASSP39728.2021.9414429］

Karras T， Aila T， Laine S and Lehtinen J. 2018. Progressive growing of GANs for improved quality， stability， and variation ［EB/OL］. ［2023-01-15］. https://arxiv.org/pdf/1710.10196.pdfhttps://arxiv.org/pdf/1710.10196.pdf

Li P P， Hu Y B， Li Q， He R and Sun Z N. 2018. Global and local consistent age generative adversarial networks//Proceedings of the 24th International Conference on Pattern Recognition （ICPR）. Beijing， China： IEEE： 1073-1078 ［DOI： 10.1109/ICPR.2018.8545119http://dx.doi.org/10.1109/ICPR.2018.8545119］

Li Z Q， Jiang R W and Aarabi P. 2021. Continuous face aging via self-estimated residual age embedding//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 15003-15012 ［DOI： 10.1109/CVPR46437.2021.01476http://dx.doi.org/10.1109/CVPR46437.2021.01476］

Liu Y F， Li Q and Sun Z N. 2019a. Attribute-aware face aging with wavelet-based generative adversarial networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 11869-11878 ［DOI： 10.1109/CVPR.2019.01215http://dx.doi.org/10.1109/CVPR.2019.01215］

Liu Y F， Li Q， Sun Z N and Tan T N. 2019b. A3GAN： an attribute-aware attentive generative adversarial network for face aging. IEEE Transactions on Information Forensics and Security， 16： 2776-2790 ［DOI： 10.1109/TIFS.2021.3065499http://dx.doi.org/10.1109/TIFS.2021.3065499］

Or-El R， Sengupta S， Fried O， Shechtman E and Kemelmacher-Shlizerman I. 2020. Lifespan age transformation synthesis//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 739-755 ［DOI： 10.1007/978-3-030-58539-6_44http://dx.doi.org/10.1007/978-3-030-58539-6_44］

Pumarola A， Agudo A， Martinez A M， Sanfeliu A and Moreno-Noguer F. 2018. GANimation： anatomically-aware facial animation from a single image//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 835-851 ［DOI： 10.1007/978-3-030-01249-6_50http://dx.doi.org/10.1007/978-3-030-01249-6_50］

Rothe R， Timofte R and Van Gool L. 2018. Deep expectation of real and apparent age from a single image without facial landmarks. International Journal of Computer Vision， 126（2）： 144-157 ［DOI： 10.1007/s11263-016-0940-3http://dx.doi.org/10.1007/s11263-016-0940-3］

Tang X， Wang Z W， Luo W X and Gao S H. 2018. Face aging with identity-preserved conditional generative adversarial networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Salt Lake City， USA： IEEE： 7939-7947 ［DOI： 10.1109/CVPR.2018.00828http://dx.doi.org/10.1109/CVPR.2018.00828］

Wu L W， Sun R， Kan J S and Gao J. 2020. Double dual generative adversarial networks for cross-age sketch-to-photo translation. Journal of Image and Graphics， 25（4）： 732-744

吴柳玮，孙锐，阚俊松，高隽. 2020. 双重对偶生成对抗网络的跨年龄素描—照片转换. 中国图象图形学报， 25（4）： 732-744 ［DOI： 10.11834/jig.190329http://dx.doi.org/10.11834/jig.190329］

Yang H Y， Huang D， Wang Y H and Jain A K. 2018. Learning face age progression： a pyramid architecture of GANs//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Salt Lake City， USA： IEEE： 31-39 ［DOI： 10.1109/CVPR.2018.00011http://dx.doi.org/10.1109/CVPR.2018.00011］

Yao X， Puy G， Newson A， Gousseau Y and Hellier P. 2021. High resolution face age editing//Proceedings of the 25th International Conference on Pattern Recognition （ICPR）. Milan， Italy： IEEE： 8624-8631 ［DOI： 10.1109/ICPR48806.2021.9412383http://dx.doi.org/10.1109/ICPR48806.2021.9412383］

Zhang K， Su Y K， Guo X W， Qi L and Zhao Z B. 2021. MU-GAN： facial attribute editing based on multi-attention mechanism. IEEE/CAA Journal of Automatica Sinica， 8（9）： 1614-1626 ［DOI： 10.1109/JAS.2020.1003390http://dx.doi.org/10.1109/JAS.2020.1003390］

Zhang K， Wang X S， Guo Y R， Su Y K and He Y X. 2019. Survey of deep learning methods for face age estimation. Journal of Image and Graphics， 24（8）： 1215-1230

张珂，王新胜，郭玉荣，苏昱坤，何颖宣. 2019. 人脸年龄估计的深度学习方法综述. 中国图象图形学报， 24（8）： 1215-1230 ［DOI： 10.11834/jig.180653http://dx.doi.org/10.11834/jig.180653］

Zhang R， Isola P， Efros A A， Shechtman E and Wang O. 2018. The unreasonable effectiveness of deep features as a perceptual metric//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 586-595 ［DOI： 10.1109/CVPR.2018.00068http://dx.doi.org/10.1109/CVPR.2018.00068］

Zhu H P， Huang Z Z， Shan H M and Zhang J P. 2020. Look globally， age locally： face aging with an attention mechanism//Proceedings of 2020 IEEE International Conference on Acoustics， Speech and Signal Processing （ICASSP）. Barcelona， Spain： IEEE： 1963-1967 ［DOI： 10.1109/ICASSP40776.2020.9054553http://dx.doi.org/10.1109/ICASSP40776.2020.9054553］

Alert me when the article has been cited

提交

3D object detection based on domain attention and dilated convolution

Infrared-visible image object detection algorithm using feature dynamic selection

Attention-guided local feature joint learning for facial expression recognition

Two-discriminators-deep residual GAN hyperspectral image pan-sharpening

Chemical structure recognition method based on attention mechanism and encoder-decoder architecture