前景—背景语义解耦的图像修复

叶学义; 睢明聪; 薛智权; 王佳欣; 陈华华

doi:10.11834/jig.240165

图像理解和计算机视觉 | 浏览量 : 0 下载量: 1 CSCD: 0

PDF
导出
分享
收藏
专辑

前景—背景语义解耦的图像修复
Image inpainting with foreground-background semantic decoupling
2025年30卷第2期页码：533-545
纸质出版日期： 2025-02-16 ，
DOI： 10.11834/jig.240165
稿件说明：

移动端阅览

叶学义，睢明聪，薛智权，王佳欣，陈华华. 2025. 前景—背景语义解耦的图像修复. 中国图象图形学报， 30(02):0533-0545

Ye Xueyi， Sui Mingcong， Xue Zhiquan， Wang Jiaxin， Chen Huahua. 2025. Image inpainting with foreground-background semantic decoupling. Journal of Image and Graphics， 30(02):0533-0545
叶学义，睢明聪，薛智权，王佳欣，陈华华. 2025. 前景—背景语义解耦的图像修复. 中国图象图形学报， 30(02):0533-0545 DOI： 10.11834/jig.240165.

Ye Xueyi， Sui Mingcong， Xue Zhiquan， Wang Jiaxin， Chen Huahua. 2025. Image inpainting with foreground-background semantic decoupling. Journal of Image and Graphics， 30(02):0533-0545 DOI： 10.11834/jig.240165.

摘要

目的

修复前后的图像在语义上保持一致是图像修复研究遵循的基本规则之一。然而，现有的图像修复方法往往忽视了图像前景与背景间的语义区别，从而在修复过程后二者相互影响导致边缘模糊和语义混杂等问题。针对此问题，提出了一种基于语义解耦前景背景的图像修复方法。

方法

此方法由3个步骤组成：语义修复、前景修复以及整体修复。初始阶段，对缺失的语义标签图进行修补；随后，采用经过修复的语义图将缺损图像的前景与背景分离，然后将损坏的前景区域输入到前景修复模块进行修复；最终，将修复后的前景区域嵌入到损失的图像中，输入到整体修复模块完成整体修复及前景背景融合。

结果

在公开的CelebA-HQ人脸数据集和Cityscapes街景数据集上与现有同类方法进行比较，本文方法在学习感知图像块相似度、峰值信噪比和结构相似性指标上表现更好；相较于对比方法的最优平均值，在CelebA-HQ数据集上，学习感知图像块相似度降低8.86%，结构相似性提高1.10%，且此方法峰值信噪比均值达到27.09 dB；在Cityscapes数据集上，学习感知图像块相似度降低4.62%，结构相似性提高0.45%，且此方法峰值信噪比均值达到27.31 dB。消融实验的数据表明了算法各个环节的必要性和有效性。

结论

该图像修复方法通过将前景背景的语义解耦，采用三段式算法流程递进完成图像修复，有效减少了语义混乱和边界模糊的影响，修复后生成的图像前景背景边界清晰，颜色风格和谐，语义连贯。

Abstract

Objective

Image inpainting is a technique that infers and repairs damaged or missing regions of an image on the basis of the known content of the image. It originated from artists restoring damaged paintings or photographs to restore their quality as close as possible to the original image. This technique has been widely applied in fields such as cultural heritage preservation， image editing， and medical image processing. The development of image inpainting technology has undergone a transition from traditional methods to modern methods. Traditional methods are typically good at handling small areas of simple structured image textures， but they often fail to achieve satisfactory results when faced with large missing areas and complex structural and textual information. With the rise of the big data era， deep learning methods such as generative adversarial networks have rapidly developed， substantially improving the effectiveness of image inpainting. Compared with traditional image inpainting algorithms， deep learning methods can better understand the semantic information of images， improving the accuracy and efficiency of repair. Deep learning models can fully understand the semantic information of images and generate highly accurate repair results when a large amount of data is learned. However， current methods usually treat images as a whole for repair. From a semantic perspective， the foreground and background have significant differences. Treating the foreground and background may lead to problems such as blurred edges and structural deformation， resulting in unsatisfactory results. This issue is addressed by adopting a new image inpainting framework that uses semantic label maps to separate the foreground and background for repair.

Method

The image inpainting method includes three modules： a semantic inpainting module， a foreground inpainting module， and an overall inpainting module. The purpose of the semantic repair module is to repair the defective semantic map to guide the subsequent semantic decoupling of the foreground and background areas. In the semantic repair phase， the missing semantic tag graph can be repaired， and the semantic information of the missing region can be enhanced. Then， the foreground mask is extracted via the repaired semantic map to obtain the accurate boundary and shape information of the foreground region. In the foreground restoration stage， the foreground region of the defect image is extracted on the basis of the foreground mask， and then the foreground restoration module is used to restore the texture and fill the missing region. The foreground area usually contains the key information in the image. Highly accurate and detailed foreground objects and their semantic information can be obtained when the foreground is repaired. The restored foreground region is subsequently embedded into the missing image. Finally， the missing image with foreground restoration completed is input to the overall repair module， which completes the two tasks of repairing the background region of the missing image and foreground–background fusion. The overall inpainting module repairs the entire image on the basis of the context information of the foreground， maintaining the consistency and smoothness of the image and further improving the inpainting effect of the foreground region. A joint loss function was employed for the three stages of image inpainting. The semantic inpainting module uses adversarial loss and semantic distribution loss to further improve the accuracy of semantic inpainting. The foreground inpainting and overall inpainting modules further incorporate perceptual loss， style loss， and global loss in addition to the losses. In particular， perceptual loss is used to ensure that the restoration results closely resemble the original image in terms of perception； style loss is used to reduce the occurrence of checkerboard artifacts caused by transposed convolution layers； and global loss is used to guarantee that the restored results exhibit a more coherent structure and content across the entire image. When these different types of loss functions are utilized， the proposed method can generate more realistic and natural images while maintaining high-quality inpainting results.

Result

Comparative experiments with other current image restoration methods demonstrated that the proposed approach outperforms other methods in terms of learned perceptual image patch similarity， the peak signal-to-noise ratio， and the structural similarity index on the CelebA-HQ and Cityscapes public datasets. Compared with the best average values of the baseline methods， on the CelebA-HQ dataset， the learned perceptual image patch similarity decreased by 8.86%， the structure similarity index measure （SSIM） improved by 1.1%， and the average peak signal-to-noise ratio（PSNR） was 27.09 dB. For the Cityscapes dataset， the LPIPS decreased by 4.62%， the SSIM improved by 0.45%， and the average PSNR was 27.31 dB. Furthermore， ablation experiments confirm the necessity and effectiveness of each component in the algorithm used in this study.

Conclusion

This image inpainting method decouples the semantics of the foreground and background and uses a three-stage algorithm process to complete the image inpainting step by step， which effectively reduces the impact of semantic confusion and fuzzy boundaries. The foreground and background boundaries of the repaired image are clear， and the color style is more reasonable.

关键词

Keywords

references

Barnes C ， Shechtman E ， Finkelstein A and Goldman D B . 2009 . PatchMatch： a randomized correspondence algorithm for structural image editing . ACM Transactions on Graphics （TOG）， 28 （ 3 ）： # 24 ［ DOI： 10.1145/1531326.1531330 http://dx.doi.org/10.1145/1531326.1531330 ］

Chen L ， Yuan C G ， Qin X ， Sun W and Zhu X F . 2023 . Contrastive structure and texture fusion for image inpainting . Neurocomputing ， 536 ： 1 - 12 ［ DOI： 10.1016/j.neucom.2023.03.014 http://dx.doi.org/10.1016/j.neucom.2023.03.014 ］

Ding D ， Ram S and Rodriguez J J . 2018 . Perceptually aware image inpainting . Pattern Recognition ， 83 ： 174 - 184 ［ DOI： 10.1016/j.patcog.2018.05.025 http://dx.doi.org/10.1016/j.patcog.2018.05.025 ］

Goodfellow I J ， Pouget-Abadie J ， Mirza M ， Xu B ， Warde-Farley D ， Ozair S ， Courville A and Bengio Y . 2014 . Generative adversarial nets // Proceedings of the 27th International Conference on Neural Information Processing Systems . Montreal， Canada ： ACM： 2672 - 2680

Guo X F ， Yang H Y and Huang D . 2021 . Image inpainting via conditional texture and structure dual generation // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada . IEEE ： 14114 - 14123 ［ DOI： 10.1109/ICCV48922.2021.01387 http://dx.doi.org/10.1109/ICCV48922.2021.01387 ］

Isola P ， Zhu J Y ， Zhou T H and Efros A A . 2017 . Image-to-image translation with conditional adversarial networks // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 5967 - 5976 ［ DOI： 10.1109/CVPR.2017.632 http://dx.doi.org/10.1109/CVPR.2017.632 ］

Johnson J ， Alahi A and Li F F . 2016 . Perceptual losses for real-time style transfer and super-resolution // Proceedings of the 14th European Conference on Computer Vision . Amsterdam， the Netherlands ： Springer： 694 - 711 ［ DOI： 10.1007/978-3-319-46475-6_43 http://dx.doi.org/10.1007/978-3-319-46475-6_43 ］

Li J Y ， Wang N ， Zhang L F ， Du B and Tao D C . 2020 . Recurrent feature reasoning for image inpainting // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 7757 - 7765 ［ DOI： 10. 1109 / CVPR42600. 2020.00778 http://dx.doi.org/10.1109/CVPR42600.2020.00778 ］

Liao L ， Xiao J ， Wang Z ， Lin C W and Satoh S I . 2020 . Guidance and evaluation： semantic-aware image inpainting for mixed scenes // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 683 - 700 ［ DOI： 10.1007/978-3-030-58583-9_41 http://dx.doi.org/10.1007/978-3-030-58583-9_41 ］

Liao L ， Xiao J ， Wang Z ， Lin C W and Satoh S I . 2021 . Image inpainting guided by coherence priors of semantics and textures // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： 6535 - 6544 ［ DOI： 10.1109/CVPR46437.2021.00647 http://dx.doi.org/10.1109/CVPR46437.2021.00647 ］

Liu G L ， Reda F A ， Shih K J ， Wang T C ， Tao A and Catanzaro B . 2018 . Image inpainting for irregular holes using partial convolutions // Proceedings of the 15th European Conference on Computer Vision . Munich， Germany ： Springer： 89 - 105 ［ DOI： 10.1007/978-3-030-01252-6_6 http://dx.doi.org/10.1007/978-3-030-01252-6_6 ］

Liu W R ， Mi Y C ， Yang F ， Zhang Y ， Guo H L and Liu Z M . 2022 . Generative image inpainting with multi-stage decoding network . Acta Electronica Sinica ， 50 （ 3 ）： 625 - 636

刘微容，米彦春，杨帆，张彦，郭宏林，刘仲民 . 2022 . 基于多级解码网络的图像修复 . 电子学报， 50 （ 3 ）： 625 - 636 ［ DOI： 10.12263/DZXB.20201451 http://dx.doi.org/10.12263/DZXB.20201451 ］

Luo W Y ， Yang S ， Zhang X J and Zhang W S . 2023 . SIEDOB： semantic image editing by disentangling object and background // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， Canada ： IEEE： 1868 - 1878 ［ DOI： 10.1109/CVPR52729.2023.00186 http://dx.doi.org/10.1109/CVPR52729.2023.00186 ］

Pathak D ， Krahenbühl P ， Donahue J ， Darrell T and Efros A A . 2016 . Context encoders： feature learning by inpainting // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， USA ： IEEE： 2536 - 2544 . ［ DOI： 10.1109/CVPR.2016.278 http://dx.doi.org/10.1109/CVPR.2016.278 ］

Quan W Z ， Zhang R S ， Zhang Y ， Li Z F ， Wang J and Yan D M . 2022 . Image inpainting with local and global refinement . IEEE Transactions on Image Processing ， 31 ： 2405 - 2420 ［ DOI： 10.1109/TIP.2022.3152624 http://dx.doi.org/10.1109/TIP.2022.3152624 ］

Shen J H ， Kang S H and Chan T F . 2003 . Euler’s Elastica and curvature-based inpainting . SIAM Journal on Applied Mathematics ， 63 （ 2 ）： 564 - 592 ［ DOI： 10.1137/s0036139901390088 http://dx.doi.org/10.1137/s0036139901390088 ］

Wang Q N and Chen Y . 2022 . Enhanced semantic dual decoder generation model for image inpainting . Journal of Image and Graphics ， 27 （ 10 ）： 2994 - 3009

王倩娜，陈燚 . 2022 . 面向图像修复的增强语义双解码器生成模型 . 中国图象图形学报， 27 （ 10 ）： 2994 - 3009 ［ DOI： 10.11834/jig.210301 http://dx.doi.org/10.11834/jig.210301 ］

Wang T F ， Ouyang H and Chen Q F . 2021 . Image inpainting with external-internal learning and monochromic bottleneck // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： 5116 - 5125 ［ DOI： 10.1109/CVPR46437.2021.00508 http://dx.doi.org/10.1109/CVPR46437.2021.00508 ］

Wang Z ， Bovik A C ， Sheikh H R and Simoncelli E P . 2004 . Image quality assessment： from error visibility to structural similarity . IEEE Transactions on Image Processing ， 13 （ 4 ）： 600 - 612 ［ DOI： 10.1109/TIP.2003.819861 http://dx.doi.org/10.1109/TIP.2003.819861 ］

Xiang H Y ， Zou Q ， Nawaz M A ， Huang X F ， Zhang F and Yu H K . 2023 . Deep learning for image inpainting： a survey . Pattern Recognition ， 134 ： # 109046 ［ DOI： 10.1016/j.patcog.2022.109046 http://dx.doi.org/10.1016/j.patcog.2022.109046 ］

Xiong W ， Yu J H ， Lin Z ， Yang J M ， Lu X ， Barnes C and Luo J B . 2019 . Foreground-aware image inpainting // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 5833 - 5841 ［ DOI： 10.1109/CVPR.2019.00599 http://dx.doi.org/10.1109/CVPR.2019.00599 ］

Yang H J ， Li L Q and Wang D . 2022 . Deep learning image inpainting combining semantic segmentation reconstruction and edge reconstruction . Journal of Image and Graphics ， 27 （ 12 ）： 3553 - 3565

杨红菊，李丽琴，王鼎 . 2022 . 联合语义分割与边缘重建的深度学习图像修复 . 中国图象图形学报， 27 （ 12 ）： 3553 - 3565 ［ DOI： 10.11834/jig.210702 http://dx.doi.org/10.11834/jig.210702 ］

Ye X Y ， Zeng M S ， Sun W J ， Wang L Y and Zhao Z J . 2023 . Image inpainting based on multi-scale stable-field GAN . SCIENTIA SINICA Informationis ， 53 （ 4 ）： 682 - 698

叶学义，曾懋胜，孙伟杰，王凌宇，赵知劲 . 2023 . 多尺度稳定场GAN的图像修复模型 . 中国科学：信息科学）， 53 （ 4 ）： 682 - 698 ［ DOI： 10.1360/SSI-2022-0065 http://dx.doi.org/10.1360/SSI-2022-0065 ］

Yu J H ， Lin Z ， Yang J M ， Shen X H ， Lu X and Huang T . 2019 . Free-form image inpainting with gated convolution // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 4470 - 4479 ［ DOI： 10.1109/ICCV.2019.00457 http://dx.doi.org/10.1109/ICCV.2019.00457 ］

Yu T ， Guo Z Y ， Jin X ， Wu S L ， Chen Z B ， Li W P ， Zhang Z Z and Liu S . 2020 . Region normalization for image inpainting // Proceedings of the 34th AAAI Conference on Artificial Intelligence . New York， USA ： AAAI： 12733 - 12740 ［ DOI： 10.1609/aaai.v34i07.6967 http://dx.doi.org/10.1609/aaai.v34i07.6967 ］

Yu W B ， Du J H ， Liu R X ， Li Y X and Zhu Y S . 2022a . Interactive image inpainting using semantic guidance // Proceedings of 26th International Conference on Pattern Recognition . Montreal， Canada ： IEEE： 168 - 174 ［ DOI： 10.1109/ICPR56361.2022.9956171 http://dx.doi.org/10.1109/ICPR56361.2022.9956171 ］

Yu Y S ， Du D W ， Zhang L B and Luo T J . 2022b . Unbiased multi-modality guidance for image inpainting // Proceedings of 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 668 - 684 ［ DOI： 10.1007/978-3-031-19787-1_38 http://dx.doi.org/10.1007/978-3-031-19787-1_38 ］

Zhang R ， Isola P ， Efros A A ， Shechtman E and Wang O . 2018 . The unreasonable effectiveness of deep features as a perceptual metric // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 586 - 595 ［ DOI： 10.1109/cvpr.2018.00068 http://dx.doi.org/10.1109/cvpr.2018.00068 ］

Zhang R S ， Quan W Z ， Zhang Y ， Wang J and Yan D M . 2023 . W-net： Structure and texture interaction for image inpainting . IEEE Transactions on Multimedia ， 25 ： 7299 - 7310 ［ DOI： 10.1109/TMM.2022.3219728 http://dx.doi.org/10.1109/TMM.2022.3219728 ］

Zhang Y L ， Liu Y M ， Hu R T ， Wu Q and Zhang J . 2024 . Mutual dual-task generator with adaptive attention fusion for image inpainting . IEEE Transactions on Multimedia ， 26 ： 1539 - 1550 ［ DOI： 10.1109/TMM.2023.3282892 http://dx.doi.org/10.1109/TMM.2023.3282892 ］

文章被引用时，请邮件提醒。

提交

面向人脸修复篡改检测的大规模数据集

图像复原中自注意力和卷积的动态关联学习

结构先验指导的文本图像修复模型

混合双注意力机制生成对抗网络的图像修复模型

联合语义分割与边缘重建的深度学习图像修复