Survey of image composition based on deep learning

Ye Guosheng; Wang Jianming; Yang Zizhong; Zhang Yuhang; Cui Rongkai; Xuan Shuai

doi:10.11834/jig.220713

Review | Views : 0 下载量: 2 CSCD: 0

PDF
Export
Share
Collection
Album

Survey of image composition based on deep learning
Vol. 28, Issue 12, Pages: 3670-3698(2023)
Published： 16 December 2023 ，
DOI： 10.11834/jig.220713
稿件说明：

移动端阅览

叶国升，王建明，杨自忠，张宇航，崔荣凯，宣帅. 2023. 深度学习图像合成研究综述. 中国图象图形学报， 28(12):3670-3698

Ye Guosheng， Wang Jianming， Yang Zizhong， Zhang Yuhang， Cui Rongkai， Xuan Shuai. 2023. Survey of image composition based on deep learning. Journal of Image and Graphics， 28(12):3670-3698
叶国升，王建明，杨自忠，张宇航，崔荣凯，宣帅. 2023. 深度学习图像合成研究综述. 中国图象图形学报， 28(12):3670-3698 DOI： 10.11834/jig.220713.

Ye Guosheng， Wang Jianming， Yang Zizhong， Zhang Yuhang， Cui Rongkai， Xuan Shuai. 2023. Survey of image composition based on deep learning. Journal of Image and Graphics， 28(12):3670-3698 DOI： 10.11834/jig.220713.

摘要

图像合成一直是图像处理领域的研究热点，具有广泛的应用前景。从原图中精确提取出前景目标对象并将其与新背景合成，构造尽量接近真实的图像是图像合成的基本目标。为推动基于深度学习的图像合成技术研究与发展，本文论述了当前图像合成任务中面临的主要问题：1）前景对象适应性问题，包括前景对象相对于背景图像的大小、位置、几何角度等几何一致性问题，以及前后景互相遮挡、前景对象边缘细节模糊的外观一致性问题；2）视觉和谐问题，包括前后景色彩、对比度、饱和度等不统一的色调一致性问题，及前景对象丢失对应阴影的阴影缺失问题；3）生境适应性问题，表现为前景对象与背景图像的逻辑合理性。总结了目前为解决不同问题主要使用的深度学习方法，同时对不同问题中的合成图像结果进行质量评估，总结了相应的评价指标，并介绍了为解决不同问题所使用的公开数据集，同时进行了深度学习方法的对比，描述了图像合成技术的主要应用场景，最后分析了基于深度学习的图像合成技术中仍然存在的不足，同时提出可行的研究意见，并对未来图像合成技术发展方向提出展望。

Abstract

Image composition has always been a research hotspot in the field of image processing and has a wide range of application prospects. This process involves accurately extracting the foreground objects in an image and compositing them with a new background image. However， traditional image compositions methods are often time consuming and labor intensive. Users not only need to manually complete the accurate extraction and reasonable placement of foreground objects but also need to adjust the lighting conditions， saturation， edge details， shadows， and other information of foreground objects to make the image quality close to that of the real image. With the development of deep learning technology in recent years， image composition technology has attracted increasing applications and has demonstrated its efficiency. To promote the research and development of image composition technology based on deep learning， this paper expounds four main problems faced in current image composition tasks. First， the foreground object adaptation problem mainly involves foreground object size adjustment， spatial position placement， blurred edge detail processing of foreground objects， and unreasonable mutual occlusion of foreground and background. The current deep learning methods for solving this problem include appropriate bounding box prediction for foreground objects in background images， spatial transformation networks， foreground object location distribution prediction and adversarial training， image fusion technology， and guided placement based on domain information. Second， the foreground object harmonization problem mainly concerns the non-uniformity in the visual information， such as illumination， color， saturation， and contrast， of the foreground and background images after image composition. The current deep learning methods for solving this problem include the attention-based guidance mechanism， domain-information-based verification and discrimination methods， codecs， context-dependent capabilities of Transformers， assisting input with high dynamic range （HDR）， and borrowing methods， such as style transfer. Third， the foreground object shadow harmonization problem mainly involves generating shadows of missing foreground objects in composite images. The current deep learning methods for solving this problem include methods based on image rendering， shadow generation using generative adversarial networks， relying on background ambient lighting assistance， and attention-based methods and mechanisms. Fourth， the habitat adaptation problem between the foreground object and background mainly focuses on biological information matching， which should be considered when compositing foreground objects and background images. Whether foreground objects， such as animals and plants， can be composited in background images is the first problem that should be considered in image composition tasks. The background image selection of an object cannot deviate from its corresponding habitat information. For instance， seagulls do not appear in the desert， and flowers do not grow from ice and snow. The foreground object adaptation problem can be regarded as the key problem in image composition. As long as the foreground objects are correctly and reasonably composited， the subsequent optimization task of the composite image can be performed efficiently. Effectively solving the visual harmonization problem of foreground objects can further improve the authenticity of composite images from the perspective of users. The most important problem to be considered is the adaptation of the foreground and background habitats. Objects and background images cannot be chosen arbitrarily but need to satisfy the logical relationship of reality， that is， to satisfy habitat adaptation， which can be regarded as the primary task in an image composition task. If the habitat information does not fit， then the foreground object and background scenes lose their logical authenticity， and all subsequent tasks fail to make the composite image realistic. This study summarizes the current deep learning methods， publicly available datasets， and evaluation indices for each of the above problems， compares the different deep learning methods， and introduces the application of image synthesis techno-logy. A composite image not only reduces the cost of real data acquisition but also improves the generalization ability of the model. The shortcomings of image composition technology based on deep learning are also analyzed， feasible research suggestions are put forward， and the future development direction of image synthesis technology is forecasted.

关键词

深度学习图像合成前景对象适应性图像和谐化生境适应性

Keywords

deep learningimage compositionforeground object adaptationimage harmonizationhabitat adaptation

references

Abu Alhaija H， Mustikovela S K， Mescheder L， Geiger A and Rother C. 2018. Augmented reality meets computer vision： efficient data generation for urban driving scenes. International Joutnal Computter Vision， 126， 961-972 ［DOI： 10.1007/s11263-018-1070-xhttp://dx.doi.org/10.1007/s11263-018-1070-x］

Arjovsky M， Chintala S and Bottou L. 2017. Wasserstein generative adversarial networks//Proceedings of the 34th International Conference on Machine Learning. Sydney， Australia： JMLR.org： 214-223

Azadi S， Pathak D， Ebrahimi S and Darrell T. 2020. Compositional GAN： learning image-conditional binary composition. International Journal of Computer Vision， 128（10/11）： 2570-2585 ［DOI： 10.1007/s11263-020-01336-9http://dx.doi.org/10.1007/s11263-020-01336-9］

Barron J T and Malik J. 2015. Shape， illumination， and reflectance from shading. IEEE Transactions on Pattern Analysis and Machine Intelligence， 37（8）： 1670-1687 ［DOI： 10.1109/TPAMI.2014.2377712http://dx.doi.org/10.1109/TPAMI.2014.2377712］

Bazazian D， Calway A and Damen D. 2022. Dual-domain image synthesis using segmentation-guided GAN ［EB/OL］. ［2022-05-20］. https://arxiv.org/pdf/2204.09015.pdfhttps://arxiv.org/pdf/2204.09015.pdf

Brasó G and Leal-Taixé L. 2020. Learning a neural solver for multiple object tracking//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 6246-6256 ［DOI： 10.1109/CVPR42600.2020.00628http://dx.doi.org/10.1109/CVPR42600.2020.00628］

Burt P and Adelson E. 1983a. The laplacian pyramid as a compact image code. IEEE Transactions on Communications， 31（4）： 532-540 ［DOI： 10.1109/TCOM.1983.1095851http://dx.doi.org/10.1109/TCOM.1983.1095851］

Burt P J and Adelson E H. 1983b. A multiresolution spline with application to image mosaics. ACM Transactions on Graphics， 2（4）： 217-236［DOI： 10.1145/245.247http://dx.doi.org/10.1145/245.247］

Bychkovsky V， Paris S， Chan E and Durand F. 2011. Learning photographic global tonal adjustment with a database of input/output image pairs//Proceedings of 2011 CVPR. Colorado Springs， USA： IEEE： 97-104 ［DOI： 10.1109/CVPR.2011.5995413http://dx.doi.org/10.1109/CVPR.2011.5995413］

Cao J Y， Cong W Y， Niu L， Zhang J F， Gao X S， Tang Z W and Zhang L Q. 2022. Deep image harmonization by bridging the reality gap ［EB/OL］. ［2022-05-20］. https://arxiv.org/pdf/2103.17104.pdfhttps://arxiv.org/pdf/2103.17104.pdf

Chang A X， Funkhouser T， Guibas L， Hanrahan P， Huang Q X， Li Z M， Savarese S， Savva M， Song S R， Su H， Xiao J X， Yi L and Yu F. 2015. ShapeNet： an information-rich 3D model repository ［EB/OL］. ［2022-05-20］. https://arxiv.org/pdf/1512.03012.pdfhttps://arxiv.org/pdf/1512.03012.pdf

Chen B C and Kae A. 2019. Toward realistic image compositing with adversarial learning//Processdings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 8407-8416 ［DOI： 10.1109/CVPR.2019.00861http://dx.doi.org/10.1109/CVPR.2019.00861］

Chen Q F， Li D Z Y and Tang C K. 2012. KNN matting//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence， USA： IEEE： 869-876 ［DOI： 10.1109/CVPR.2012.6247760http://dx.doi.org/10.1109/CVPR.2012.6247760］

Chen S H， Wei Y K， Xu L， Dong X H and Wen K Z. 2019. Survey of image style transfer based on deep learning. Application Research of Computers， 36（8）： 2250-2255

陈淑環，韦玉科，徐乐，董晓华，温坤哲. 2019. 基于深度学习的图像风格迁移研究综述. 计算机应用研究， 36（8）： 2250-2255 ［DOI： 10.19734/j.issn.1001-3695.2018.05.0270http://dx.doi.org/10.19734/j.issn.1001-3695.2018.05.0270］

Cheng D C， Shi J， Chen Y Y， Deng X M and Zhang X P. 2018. Learning scene illumination by pairwise photos from rear and front mobile cameras. Computer Graphics Forum， 37（7）： 213-221 ［DOI： 10.1111/cgf.13561http://dx.doi.org/10.1111/cgf.13561］

Cheng D L， Prasad D K and Brown M S. 2014. Illuminant estimation for color constancy： why spatial-domain methods work and the role of the color distribution. Journal of the Optical Society of America A， 31（5）： 1049-1058 ［DOI： 10.1364/JOSAA.31.001049http://dx.doi.org/10.1364/JOSAA.31.001049］

Cong W Y， Niu L， Zhang J F， Liang J and Zhang L Q. 2021. Bargainnet： background-guided domain translation for image harmonization//Proceedings of 2021 IEEE International Conference on Multimedia and Expo. Shenzhen， China： IEEE： #9428394 ［DOI： 10.1109/ICME51207.2021.9428394http://dx.doi.org/10.1109/ICME51207.2021.9428394］

Cong W Y， Tao X H， Niu L， Liang J， Gao X S， Sun Q H and Zhang L Q. 2022. High-resolution image harmonization via collaborative dual transformations ［EB/OL］. ［2022-05-20］. https://arxiv.org/pdf/2109.06671.pdfhttps://arxiv.org/pdf/2109.06671.pdf

Cong W Y， Zhang J F， Niu L， Liu L， Ling Z X， Li W Y and Zhang L Q. 2020a. DoveNet： deep image harmonization via domain verification//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 8391-8400 ［DOI： 10.1109/CVPR42600.2020.00842http://dx.doi.org/10.1109/CVPR42600.2020.00842］

Cong W Y， Zhang J F， Niu L， Liu L， Ling Z X， Li W Y and Zhang L Q. 2020b. Image harmonization dataset iharmony4： HCOCO， HAdobe5k， HFlickr， and hday2night ［EB/OL］. ［2022-05-20］. https://arxiv.org/pdf/1908.10526.pdfhttps://arxiv.org/pdf/1908.10526.pdf

Cun X D and Pun C M. 2020. Improving the harmony of the composite image by spatial-separated attention module. IEEE Transactions on Image Processing， 29： 4759-4771 ［DOI： 10.1109/TIP.2020.2975979http://dx.doi.org/10.1109/TIP.2020.2975979］

Dematteis N and Giordan D. 2021. Comparison of digital image correlation methods and the impact of noise in geoscience applications. Remote Sensing， 13（2）： #327 ［DOI： 10.3390/rs13020327http://dx.doi.org/10.3390/rs13020327］

Dowson D C and Landau B V. 1982. The Fréchet distance between multivariate normal distributions. Journal of Multivariate Analysis， 12（3）： 450-455 ［DOI： 10.1016/0047-259X（82）90077-Xhttp://dx.doi.org/10.1016/0047-259X（82）90077-X］

Du C B and Gao S S. 2017. Image segmentation-based multi-focus image fusion through multi-scale convolutional neural network. IEEE Access， 5： 15750-15761 ［DOI： 10.1109/ACCESS.2017.2735019http://dx.doi.org/10.1109/ACCESS.2017.2735019］

El Helou M， Zhou R F， Barthas J and Süsstrunk S. 2020. VIDIT： virtual image dataset for illumination transfer ［EB/OL］. ［2022-05-20］. https://arxiv.org/pdf/2005.05460.pdfhttps://arxiv.org/pdf/2005.05460.pdf

Gardner M A， Hold-Geoffroy Y， Sunkavalli K， Gagné C and Lalonde J F. 2019. Deep parametric indoor lighting estimation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 7174-7182 ［DOI： 10.1109/ICCV.2019.00727http://dx.doi.org/10.1109/ICCV.2019.00727］

Gardner M A， Sunkavalli K， Yumer E， Shen X H， Gambaretto E， Gagné C and Lalonde J F. 2017. Learning to predict indoor illumination from a single image. ACM Transactions on Graphics， 36（6）： #176 ［DOI： 10.1145/3130800.3130891http://dx.doi.org/10.1145/3130800.3130891］

Garon M， Sunkavalli K， Hadap S， Carr N and Lalonde J F. 2019. Fast spatially-varying indoor lighting estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 6901-6910 ［DOI： 10.1109/CVPR.2019.00707http://dx.doi.org/10.1109/CVPR.2019.00707］

Gatys L， Ecker A and Bethge M. 2016a. A neural algorithm of artistic style. Journal of Vision， 16（12）： #326 ［DOI： 10.1167/16.12.326http://dx.doi.org/10.1167/16.12.326］

Gatys L A， Ecker A S and Bethge M. 2016b. Image style transfer using convolutional neural networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 2414-2423 ［DOI： 10.1109/CVPR.2016.265http://dx.doi.org/10.1109/CVPR.2016.265］

Gehler P V， Rother C， Blake A， Minka T and Sharp T. 2008. Bayesian color constancy revisited//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage， USA： IEEE： #4587765 ［DOI： 10.1109/CVPR.2008.4587765http://dx.doi.org/10.1109/CVPR.2008.4587765］

Gkioulekas I and Zhi T C. 2017. Computational photography ［EB/OL］. ［2022-05-20］. http://graphics.cs.cmu.edu/courses/15-463/2017_fall/lectures/lecture7.pdfhttp://graphics.cs.cmu.edu/courses/15-463/2017_fall/lectures/lecture7.pdf

Goodfellow I J， Pouget-Abadie J， Mirza M， Xu B， Warde-Farley D， Ozair S， Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montréal， Canada： MIT Press： 2672-2680

Grosse R， Johnson M K， Adelson E H and Freeman W T. 2009. Ground truth dataset and baseline evaluations for intrinsic image algorithms//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto， Japan： IEEE： 2335-2342 ［DOI： 10.1109/ICCV.2009.5459428http://dx.doi.org/10.1109/ICCV.2009.5459428］

Guo Z H， Guo D S， Zheng H Y， Gu Z R， Zheng B and Dong J Y. 2021a. Image harmonization with transformer//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 14850-14859 ［DOI： 10.1109/ICCV48922.2021.01460http://dx.doi.org/10.1109/ICCV48922.2021.01460］

Guo Z H， Zheng H Y， Jiang Y F， Gu Z R and Zheng B. 2021b. Intrinsic image harmonization//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 16362-16371 ［DOI： 10.1109/CVPR46437.2021.01610http://dx.doi.org/10.1109/CVPR46437.2021.01610］

Hao G Q. Iizuka S and Fukui K. 2020. Image harmonization with attention-based deep feature modulation//Proceedings of the 31st British Machine Vision Conference.Virtual Event， UK： BMVA Press 2020

He J J. 2020. Face Image Synthesis Method Research and Application Using Machine Learning Based Image Generation Algorithm. Hefei， China： University of Science and Technology of China

何冀军. 2020. 用于图像生成的机器学习算法在人像合成中的研究与应用. 合肥：中国科学技术大学

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/cvpr.2016.90http://dx.doi.org/10.1109/cvpr.2016.90］

Heusel M， Ramsauer H， Unterthiner T， Nessler B and Hochreiter S. 2017. GANs trained by a two time-scale update rule converge to a local nash equilibrium//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 6629-6640

Hold-Geoffroy Y， Athawale A and Lalonde J F. 2019. Deep sky modeling for single image outdoor lighting estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 6920-6928 ［DOI： 10.1109/CVPR.2019.00709http://dx.doi.org/10.1109/CVPR.2019.00709］

Hong Y， Niu L， Zhang J F and Zhang L Q. 2022. Shadow generation for composite image in real-world scenes ［EB/OL］. ［2022-05-20］. https://arxiv.org/pdf/2104.10338.pdfhttps://arxiv.org/pdf/2104.10338.pdf

Hou L， Vicente T F Y， Hoai M and Samaras D. 2021. Large scale shadow annotation and detection using lazy annotation and stacked CNNs. IEEE Transactions on Pattern Analysis and Machine Intelligence， 43（4）： 1337-1351 ［DOI： 10.1109/TPAMI.2019.2948011http://dx.doi.org/10.1109/TPAMI.2019.2948011］

Hu X W， Jiang Y T， Fu C W and Heng P A. 2019. Mask-ShadowGan： learning to remove shadows from unpaired data//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 2472-2481 ［DOI： 10.1109/iccv.2019.00256http://dx.doi.org/10.1109/iccv.2019.00256］

Hu Z Y， Nsampi N E， Wang X and Wang Q. 2021. NeurSF： neural shading field for image harmonization ［EB/OL］. ［2022-05-20］. https://arxiv.org/pdf/2112.01314.pdfhttps://arxiv.org/pdf/2112.01314.pdf

Huang H X and Niu L. 2022. ccHarmony： color-checker based image harmonization dataset ［EB/OL］. ［2022-05-20］. https://arxiv.org/pdf/2206.00800.pdfhttps://arxiv.org/pdf/2206.00800.pdf

Isola P， Zhu J Y， Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 5967-5976 ［DOI： 10.1109/CVPR.2017.632http://dx.doi.org/10.1109/CVPR.2017.632］

Jaderberg M， Simonyan K， Zisserman A and Kavukcuoglu K. 2015. Spatial transformer networks//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal， Canada： MIT Press： 2017-2025 ［DOI： 10.5555/2969442.2969465http://dx.doi.org/10.5555/2969442.2969465］

Jiang Y F， Zhang H， Zhang J M， Wang Y L， Lin Z， Sunkavalli K， Chen S， Amirghodsi S， Kong S and Wang Z Y. 2021. SSH： a self-supervised framework for image harmonization//Proce-edings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 4812-4821 ［DOI： 10.1109/iccv48922.2021.00479http://dx.doi.org/10.1109/iccv48922.2021.00479］

Kaur H， Koundal D and Kadyan V. 2021. Image fusion techniques： a survey. Archives of Computational Methods in Engineering， 28（7）： 4425-4447 ［DOI：10.1007/s11831-021-09540-7http://dx.doi.org/10.1007/s11831-021-09540-7］

Laffont P Y， Ren Z L， Tao X F， Qian C and Hays J. 2014. Transient attributes for high-level understanding and editing of outdoor scenes. ACM Transactions on Graphics， 33（4）： #149 ［DOI： 10.1145/2601097.2601101http://dx.doi.org/10.1145/2601097.2601101］

Lalonde J F and Efros A A. 2007. Using color compatibility for assessing image realism//Proceedings of the 11th International Conference on Computer Vision. Rio de Janeiro， Brazil： IEEE： #4409107 ［DOI： 10.1109/ICCV.2007.4409107http://dx.doi.org/10.1109/ICCV.2007.4409107］

Lee D， Liu S F， Gu J W， Liu M Y， Yang M H and Kautz J. 2018. Context-aware synthesis and placement of object instances//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal， Canada： Curran Associates Inc.： 10414-10424 ［DOI： 10.5555/3327546.3327701http://dx.doi.org/10.5555/3327546.3327701］

Li X T， Liu S F， Kim K， Wang X L， Yang M H and Kautz J. 2019. Putting humans in a scene： learning affordance in 3D indoor environments//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 12360-12368 ［DOI： 10.1109/CVPR.2019.01265http://dx.doi.org/10.1109/CVPR.2019.01265］

Liao B， Zhu Y， Liang C， Luo F and Xiao C X. 2019. Illumination animating and editing in a single picture using scene structure estimation. Computers and Graphics， 82： 53-64 ［DOI： 10.1016/j.cag.2019.05.007http://dx.doi.org/10.1016/j.cag.2019.05.007］

Lin C H， Yumer E， Wang O， Shechtman E and Lucey S. 2018. ST-GAN： spatial transformer generative adversarial networks for image compositing//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 9455-9464 ［DOI： 10.1109/CVPR.2018.00985http://dx.doi.org/10.1109/CVPR.2018.00985］

Lin T Y， Maire M， Belongie S， Hays J， Perona P， Ramanan D， Dollár P and Zitnick C L. 2014. Microsoft COCO： common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich， Switzerland： Springer： 740-755 ［DOI： 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48］

Ling J， Xue H， Song L， Xie R and Gu X. 2021. Region-aware adaptive instance normalization for image harmonization//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 9357-9366 ［DOI： 10.1109/CVPR46437.2021.00924http://dx.doi.org/10.1109/CVPR46437.2021.00924］

Liu B， Xu K and Martin R R. 2017. Static scene illumination estimation from videos with applications. Journal of Computer Science and Technology， 32（3）： 430-442 ［DOI： 10.1007/s11390-017-1734-yhttp://dx.doi.org/10.1007/s11390-017-1734-y］

Liu D Q， Long C J， Zhang H P， Yu H N， Dong X Z and Xiao C X. 2020. ARShadowGAN： shadow generative adversarial network for augmented reality in single light scenes//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 8136-8145 ［DOI： 10.1109/cvpr42600.2020.00816http://dx.doi.org/10.1109/cvpr42600.2020.00816］

Liu L， Liu Z C， Zhang B， Li J T， Niu L， Liu Q Y and Zhang L Q. 2021. OPA： object placement assessment dataset ［EB/OL］. ［2022-05-20］. https://arxiv.org/pdf/2107.01889.pdfhttps://arxiv.org/pdf/2107.01889.pdf

Luan F J， Paris S， Shechtman E and Bala K. 2018. Deep painterly harmonization. Computer Graphics Forum， 37（4）： 95-106 ［DOI： 10.1111/cgf.13478http://dx.doi.org/10.1111/cgf.13478］

Make Human Community. 2022. MakeHuman： open source tool for making 3D characters ［EB/OL］. ［2022-05-20］. http://www.makehumancommunity.orghttp://www.makehumancommunity.org

Miao H， Lu F X， Liu Z D， Zhang L J， Manocha D and Zhou B. 2021. Robust 2D/3D vehicle parsing in arbitrary camera views for CVIS//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 15611-15620 ［DOI： 10.1109/ICCV48922.2021.01534http://dx.doi.org/10.1109/ICCV48922.2021.01534］

Mirza M and Osindero S. 2014. Conditional generative adversarial nets ［EB/OL］. ［2022-05-20］. https://arxiv.org/pdf/1411.1784.pdfhttps://arxiv.org/pdf/1411.1784.pdf

Nguyen V， Vicente T F Y， Zhao M Z， Hoai M and Samaras D. 2017. Shadow detection with conditional generative adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 4520-4528 ［DOI： 10.1109/ICCV.2017.483http://dx.doi.org/10.1109/ICCV.2017.483］

Niu L， Cong W Y， Liu L， Hong Y， Zhang B， Liang J and Zhang L Q. 2021. Making images real again： a comprehensive survey on deep image composition ［EB/OL］. ［2022-05-20］. https://arxiv.org/pdf/2106.14490.pdfhttps://arxiv.org/pdf/2106.14490.pdf

Pandey R， Escolano S O， Legendre C， Häne C， Bouaziz S， Rhemann C， Debevec P and Fanello S. 2021. Total relighting： learning to relight portraits for background replacement. ACM Transactions on Graphics， 40（4）： #43 ［DOI： 10.1145/3450626.3459872http://dx.doi.org/10.1145/3450626.3459872］

Paramanandham N and Rajendiran K. 2018. Multi sensor image fusion for surveillance applications using hybrid image fusion algorithm. Multimedia Tools and Applications， 77（10）： 12405-12436 ［DOI： 10.1007/s11042-017-4895-3http://dx.doi.org/10.1007/s11042-017-4895-3］

Patil V， Sale D and Joshi M A. 2013. Image fusion methods and quality assessment parameters. Asian Journal of Engineering and Applied Technology， 2（1）： 40-46

Peng J L， Luo Z K， Liu L， Zhang B S， Wang T， Wang Y B， Tai Y， Wang C J and Lin W Y. 2022. FRIH： fine-grained region-aware image harmonization ［EB/OL］. ［2022-05-20］. https://arxiv.org/pdf/2205.06448.pdfhttps://arxiv.org/pdf/2205.06448.pdf

Pérez P， Gangnet M and Blake A. 2003. Poisson image editing//Proceedings of ACM SIGGRAPH 2003. San Diego， USA： Association for Computing Machinery： 313-318 ［DOI： 10.1145/1201775.882269http://dx.doi.org/10.1145/1201775.882269］

Qu G H， Zhang D L and Yan P F. 2002. Information measure for performance of image fusion. Electronics Letters， 38（7）： 313-315 ［DOI： 10.1049/el：20020212http://dx.doi.org/10.1049/el：20020212］

Ros G， Sellart L， Materzynska J， Vazquez D and Lopez A M. 2016. The SYNTHIA dataset： a large collection of synthetic images for semantic segmentation of urban scenes//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 3234-3243 ［DOI： 10.1109/CVPR.2016.352http://dx.doi.org/10.1109/CVPR.2016.352］

Sankaranarayanan S， Balaji Y， Jain A， Lim S N and Chellappa R. 2018. Learning from synthetic data： addressing domain shift for semantic segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 3752-3761 ［DOI： 10.1109/cvpr.2018.00395http://dx.doi.org/10.1109/cvpr.2018.00395］

Schieber T A， Carpi L， Díaz-Guilera A， Pardalos P M， Masoller C and Ravetti M G. 2017. Quantification of network structural dissimilarities. Nature Communications， 8（1）： #13928 ［DOI： 10.1038/ncomms13928http://dx.doi.org/10.1038/ncomms13928］

Sheng Y C， Zhang J M and Benes B. 2021. SSN： soft shadow network for image compositing//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 4378-4388 ［DOI： 10.1109/CVPR46437.2021.00436http://dx.doi.org/10.1109/CVPR46437.2021.00436］

Shermeyer J， Hossler T， Etten A V， Hogan D， Lewis R and Kim D. 2021. RarePlanes： synthetic data takes flight//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa， USA： IEEE： 207-217 ［DOI： 10.1109/wacv48630.2021.00025http://dx.doi.org/10.1109/wacv48630.2021.00025］

Simonyan K and Zisserman A. 2015. Verydeep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations. San Diego， USA， #1556 ［DOI： 10.48550/arXiv.1409.1556http://dx.doi.org/10.48550/arXiv.1409.1556］

Sofiiuk K， Popenova P and Konushin A. 2021. Foreground-aware semantic representations for image harmonization//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa， USA： IEEE： 1619-1628 ［DOI： 10.1109/wacv48630.2021.00166http://dx.doi.org/10.1109/wacv48630.2021.00166］

Strickland E. 2022. Are you still using real data to train your AI？［EB/OL］. ［2022-05-20］. https://spectrum.ieee.org/synthetic-data-aihttps://spectrum.ieee.org/synthetic-data-ai

Sun T C， Barron J T， Tsai Y T， Xu Z X， Yu X M， Fyffe G， Rhemann C， Busch J， Debevec P and Ramamoorthi R. 2019. Single image portrait relighting. ACM Transactions on Graphics， 38（4）： #79 ［DOI： 10.1145/3306346.3323008http://dx.doi.org/10.1145/3306346.3323008］

Szeliski R. 2011. Computer Vision： Algorithms and Applications. New York， USA： Springer

Tan F W， Bernier C， Cohen B， Ordonez V and Barnes C. 2018. Where and who？ Automatic semantic-aware person composition//Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe， USA： IEEE： 1519-1528 ［DOI： 10.1109/WACV.2018.00170http://dx.doi.org/10.1109/WACV.2018.00170］

Tan X H， Xu P P， Guo S H and Wang W C. 2019. Image composition of partially occluded objects. Computer Graphics Forum， 38（7）： 641-650 ［DOI： 10.1111/cgf.13867http://dx.doi.org/10.1111/cgf.13867］

Tremblay J， Prakash A， Acuna D， Brophy M， Jampani V， Anil C， To T， Cameracci E， Boochoon S and Birchfield S. 2018. Training deep networks with synthetic data： bridging the reality gap by domain randomization//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City， USA： IEEE： 10820-10828 ［DOI： 10.1109/cvprw.2018.00143http://dx.doi.org/10.1109/cvprw.2018.00143］

Tripathi S， Chandra S， Agrawal A， Tyagi A， Rehg J M and Chari V. 2019. Learning to generate synthetic data via compositing//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 461-470 ［DOI： 10.1109/CVPR.2019.00055http://dx.doi.org/10.1109/CVPR.2019.00055］

Tsai Y H， Shen X H， Lin Z， Sunkavalli K， Lu X and Yang M H. 2017. Deep image harmonization//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 2799-2807 ［DOI： 10.1109/cvpr.2017.299http://dx.doi.org/10.1109/cvpr.2017.299］

Valanarasu J M J， Zhang H， Zhang J M， Wang Y L， Lin Z， Echevarria J， Ma Y L， Wei Z J， Sunkavalli K and Patel V. 2023. Interactive portrait harmonization//Proceedings of the 11th International Conference on Learning Representations. Kigali， Rwanda： OpenReview.net

Wang J F， Li X and Yang J. 2018. Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 1788-1797 ［DOI： 10.1109/CVPR.2018.00192http://dx.doi.org/10.1109/CVPR.2018.00192］

Wang J M， Cheng X Y， Yang Z Z， Shi C Y， Zhang Y H and Qian Z K. 2022. Influence of different data augmentation methods on model recognition accuracy. Computer Science， 49（6A）： 418-423

王建明，陈响育，杨自忠，史晨阳，张宇航，钱正坤. 2022. 不同数据增强方法对模型识别精度的影响. 计算机科学， 49（6A）： 418-423 ［DOI： 10.11896/jsjkx.210700210http://dx.doi.org/10.11896/jsjkx.210700210］

Wang T Y， Hu X W， Wang Q， Heng P A and Fu C W. 2020. Instance shadow detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 1877-1886 ［DOI： 10.1109/CVPR42600.2020.00195http://dx.doi.org/10.1109/CVPR42600.2020.00195］

Wang Z and Bovik A C. 2002. A universal image quality index. IEEE Signal Processing Letters， 9（3）： 81-84 ［DOI： 10.1109/97.995823http://dx.doi.org/10.1109/97.995823］

Wang Z， Bovik A C， Sheikh H R and Simoncelli E P. 2004. Image quality assessment： from error visibility to structural similarity. IEEE Transactions on Image Processing， 13（4）： 600-612 ［DOI： 10.1109/TIP.2003.819861http://dx.doi.org/10.1109/TIP.2003.819861］

Ward D， Moghadam P and Hudson N. 2018. Deep leaf segmentation using synthetic data//Proceedings of 2018 British Machine Vision Conference. Newcastle， UK： BMVA Press

Weber H， Prévost D and Lalonde J F. 2018. Learning to estimate indoor lighting from 3D object//Proceedings of 2018 International Conference on 3D Vision. Verona， Italy： IEEE： 199-207 ［DOI： 10.1109/3dv.2018.00032http://dx.doi.org/10.1109/3dv.2018.00032］

Wu H and Xu D. 2012. Survey of digital image compositing. Journal of Image and Graphics， 17（11）： 1333-1346

吴昊，徐丹. 2012. 数字图像合成技术综述. 中国图象图形学报， 17（11）： 1333-1346 ［DOI： 10.11834/jig.20121101http://dx.doi.org/10.11834/jig.20121101］

Wu H K， Zheng S， Zhang J G and Huang K Q. 2019. GP-GAN： towards realistic high-resolution image blending//Proceedings of the 27th ACM International Conference on Multimedia. Nice， France： Association for Computing Machinery： 2487-2495 ［DOI： 10.1145/3343031.3350944http://dx.doi.org/10.1145/3343031.3350944］

Xu N， Price B， Cohen S and Huang T. 2017. Deep image matting//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 311-320 ［DOI： 10.1109/cvpr.2017.41http://dx.doi.org/10.1109/cvpr.2017.41］

Xue S， Agarwala A， Dorsey J and Rushmeier H. 2012. Understanding and improving the realism of image composites. ACM Transactions on Graphics， 31（4）： #84 ［DOI： 10.1145/2185520.2185580http://dx.doi.org/10.1145/2185520.2185580］

Zhan F N， Huang J X and Lu S J. 2021a. Hierarchy composition gan for high-fidelity image synthesis ［EB/OL］. ［2022-05-20］. https://arxiv.org/pdf/1905.04693.pdfhttps://arxiv.org/pdf/1905.04693.pdf

Zhan F N， Lu S J， Zhang C G， Ma F Y and Xie X S. 2021b. Adversarial image composition with auxiliary illumination//Proceedings of the 15th Asian Conference on Computer Vision. Kyoto， Japan： Springer： 234-250 ［DOI： 10.1007/978-3-030-69532-3_15http://dx.doi.org/10.1007/978-3-030-69532-3_15］

Zhan F N， Zhu H Y and Lu S J. 2019. Spatial fusion GAN for image synthesis//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 3648-3657 ［DOI： 10.1109/CVPR.2019.00377http://dx.doi.org/10.1109/CVPR.2019.00377］

Zhang H， Zhang J M， Perazzi F， Lin Z and Patel V M. 2021. Deep image compositing//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa， USA： IEEE： 365-374 ［DOI： 10.1109/WACV48630.2021.00041http://dx.doi.org/10.1109/WACV48630.2021.00041］

Zhang J S， Sunkavalli K， Hold-Geoffroy Y， Hadap S， Eisenman J and Lalonde J F. 2019a. All-weather deep outdoor lighting estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 10150-10158 ［DOI： 10.1109/CVPR.2019.01040http://dx.doi.org/10.1109/CVPR.2019.01040］

Zhang L Z， Wen T， Min J， Wang J C， Han D and Shi J B. 2020a. Learning object placement by inpainting for compositional data augmentation//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 566-581 ［DOI： 10.1007/978-3-030-58601-0_34http://dx.doi.org/10.1007/978-3-030-58601-0_34］

Zhang L Z， Wen T and Shi J B. 2020b. Deep image blending//Proceedings of 2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass， USA： IEEE： 231-240 ［DOI： 10.1109/WACV45572.2020.9093632http://dx.doi.org/10.1109/WACV45572.2020.9093632］

Zhang R， Isola P， Efros A A， Shechtman E and Wang O. 2018. The unreasonable effectiveness of deep features as a perceptual metric//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 586-595 ［DOI： 10.1109/CVPR.2018.00068http://dx.doi.org/10.1109/CVPR.2018.00068］

Zhang S Y， Liang R Z and Wang M. 2019b. ShadowGAN： shadow synthesis for virtual objects with conditional adversarial networks. Computational Visual Media， 5（1）： 105-115 ［DOI： 10.1007/s41095-019-0136-1http://dx.doi.org/10.1007/s41095-019-0136-1］

Zhao H S， Shen X H， Lin Z， Sunkavalli K， Price B and Jia J Y. 2018. Compositing-aware image search//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 517-532 ［DOI： 10.1007/978-3-030-01219-9_31http://dx.doi.org/10.1007/978-3-030-01219-9_31］

Zhao L， Gao X B and Tian C N. 2013. Review of frontal face image synthesis methods. Journal of Image and Graphics， 18（1）： 1-10

赵林，高新波，田春娜. 2013. 正面人脸图像合成方法综述. 中国图象图形学报， 18（1）： 1-10 ［DOI： 10.11834/jig.20130101http://dx.doi.org/10.11834/jig.20130101］

Zhao Y N， Price B， Cohen S and Gurari D. 2019. Unconstrained foreground object search//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 2030-2039 ［DOI： 10.1109/ICCV.2019.00212http://dx.doi.org/10.1109/ICCV.2019.00212］

Zhou B L， Zhao H， Puig X， Fidler S， Barriuso A and Torralba A. 2017. Scene parsing through ADE20K dataset//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 5122-5130 ［DOI： 10.1109/CVPR.2017.544http://dx.doi.org/10.1109/CVPR.2017.544］

Zhou B L， Zhao H， Puig X， Xiao T T， Fidler S， Barriuso A and Torralba A. 2019. Semantic understanding of scenes through the ADE20K dataset. International Journal of Computer Vision， 127（3）： 302-321 ［DOI： 10.1007/s11263-018-1140-0http://dx.doi.org/10.1007/s11263-018-1140-0］

Zhou H， Sattler T and Jacobs D W. 2016. Evaluating local features for day-night matching//Proceedings of 2016 European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 724-736 ［DOI： 10.1007/978-3-319-49409-8_60http://dx.doi.org/10.1007/978-3-319-49409-8_60］

Zhou P， Han X T， Morariu V I and Davis L S. 2018. Learning rich features for image manipulation detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 1053-1061 ［DOI： 10.1109/CVPR.2018.00116http://dx.doi.org/10.1109/CVPR.2018.00116］

Zhou S Y， Liu L， Niu L and Zhang L Q. 2022. Learning object placement via dual-path graph completion//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv， Israel： Springer： 373-389 ［DOI： 10.1007/978-3-031-19790-1_23http://dx.doi.org/10.1007/978-3-031-19790-1_23］

Zhu J Y， Krähenbühl P， Shechtman E and Efros A A. 2015. Learning a discriminative model for the perception of realism in composite images//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago， Chile： IEEE： 3943-3951 ［DOI： 10.1109/iccv.2015.449http://dx.doi.org/10.1109/iccv.2015.449］

Zhu S J， Lin Z， Cohen S， Kuen J， Zhang Z F and Chen C. 2022a. GALA： toward geometry-and-lighting-aware object search for compositing//Proceedings of the 17th European Conference. Tel Aviv， Israel： Springer： 676-692 ［DOI： 10.1007/978-3-031-19812-0_39http://dx.doi.org/10.1007/978-3-031-19812-0_39］

Zhu Z Y， Zhang Z， Lin Z， Wu R Q and Guo C L. 2022b. Image harmonization by matching regional references ［EB/OL］. ［2022-05-20］. https://arxiv.org/pdf/2204.04715.pdfhttps://arxiv.org/pdf/2204.04715.pdf

Zuo Y H. 2011. Environmental Studies. 2nd ed. Beijing， China： Higher Education Press： 183-184

左玉辉. 2011. 环境学.2版. 北京：高等教育出版社）： 183-184

Alert me when the article has been cited

提交

Survey of digital face rendering and appearance recovery methods

Comprehensive review of methods for vehicle logo recognition in intelligent transportation systems

Review of various vessels and airway segmentation in medical imaging

A review of adversarial examples for optical character recognition

Review of cross-view image geolocalization methods