用于阴影去除的小波非均匀扩散模型
Shadow removal with wavelet-based non-uniform diffusion model
- 2025年30卷第1期 页码:66-82
纸质出版日期: 2025-01-16
DOI: 10.11834/jig.230904
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2025-01-16 ,
移动端阅览
黄颖, 程彬, 房少杰, 刘歆. 用于阴影去除的小波非均匀扩散模型[J]. 中国图象图形学报, 2025,30(1):66-82.
HUANG YING, CHENG BIN, FANG SHAOJIE, LIU XIN. Shadow removal with wavelet-based non-uniform diffusion model. [J]. Journal of image and graphics, 2025, 30(1): 66-82.
目的
2
现有的阴影去除方法通常依赖于像素级重建,旨在学习阴影图像和无阴影图像之间的确定性映射关系。然而阴影去除关注阴影区域的局部恢复,容易导致在去除阴影的同时破坏非阴影区域。此外,现有的大多数扩散模型在恢复图像时存在耗时过长和对分辨率敏感等问题。为此,提出了一种用于阴影去除的小波非均匀扩散模型。
方法
2
首先将图像通过小波分解为低频分量与高频分量,然后针对低频和高频分量分别设计扩散生成网络来重建无阴影图像的小波域分布,并分别恢复这些分量中的各种退化信息,如低频(颜色、亮度)和高频细节等。
结果
2
实验在3个阴影数据集上进行训练和测试,在SRD(shadow removal dataset)数据集中,与9种代表性方法进行比较,在非阴影区域和整幅图像上,峰值信噪比(peak signal-to-noise ratio,PSNR)、结构相似性(structural similarity index,SSIM)和均方根误差(root mean square error,RMSE)均取得最优或次优的结果;在ISTD+(augmented dataset with image shadow triplets)数据集中,与6种代表性方法进行比较,在非阴影区域上,性能取得了最佳,PSNR和RMSE分别提高了0.47 dB和0.1。除此之外,在SRD数据集上,ShadowDiffusion方法在生成不同分辨率图像时性能有明显差异,而本文方法性能基本保持稳定。此外,本文方法生成速度与其相比提高了约4倍。
结论
2
提出的方法能够加快扩散模型的采样速度,在去除阴影的同时,恢复出阴影区域缺失的颜色、亮度和丰富的细节等信息。
Objective
2
Shadows are a common occurrence in optical images captured under partial or complete obstruction of light. In such images, shadow regions typically exhibit various forms of degradation, such as low contrast, color distortion, and loss of scene structure. Shadows not only impact the visual perception of humans but also impose constraints on the implementation of numerous sophisticated computer vision algorithms. Shadow removal can assist in many computer vision tasks. It aims to enhance the visibility of shadow regions in images and achieve consistent illumination distribution between shadow and non-shadow regions. Currently, deep learning-based shadow removal methods can be roughly divided into two categories. One typically utilizes deep learning to minimize the pixel-level differences between shadow regions and their corresponding non-shadow regions, aiming to learn deterministic mapping relationships between shadow and non-shadow images. However, the primary focus of shadow removal lies in locally restoring shadow regions, often overlooking the essential constraints required for effectively restoring boundaries between shadow and non-shadow regions. As a result, discrepancies in brightness exist between the restored shadow and non-shadow areas, along with the emergence of artifacts along the boundaries. Another approach involves using image generation models to directly model the complex distribution of shadow-free images, avoiding the direct learning of pixel-level mapping relationships, and treating shadow removal as a conditional generation task. While diffusion models have garnered significant attention due to their powerful generation capabilities, most existing diffusion generation models suffer from issues such as time-consuming image restoration and sensitivity to resolution when recovering images. Inspired by these challenges, a wavelet non-uniform diffusion model (WNDM) is proposed, which combines the advantages of wavelet decomposition and the generation ability of diffusion models to solve the above problems.
Method
2
First, the image is decomposed into low-frequency and high-frequency components via wavelet decomposition. Then, diffusion generation networks are designed separately for low-frequency and high-frequency components to reconstruct the wavelet domain distribution of shadow-free images and restore various degraded information within these components, such as low-frequency (color, brightness) and high-frequency details. Wavelet transform can decompose the image into high-frequency and low-frequency images without sacrificing information, and the spatial size of the decomposed images is halved. Thus, modeling diffusion in the wavelet domain not only greatly accelerates model inference but also captures information that may be lost in the pixel domain. Furthermore, considering the complexity of the distribution of low-frequency and high-frequency components and their sensitivity to noise, for example, high-frequency components exhibit sparsity, making it easier for neural networks to learn their features. Hence, this study devises two separate adaptive diffusion noise scheduling tables tailored for low-frequency and high-frequency components. The branch for low-frequency diffusion adjustment independently fine-tunes the low-frequency information within shadow images, whereas the branch for high-frequency diffusion adjustment independently refines the high-frequency information within shadow images, resulting in the generation of more precise low-frequency and high-frequency images, respectively. Additionally, the low-frequency and high-frequency diffusion adjustment branches are consolidated to share a denoising network, thus streamlining model complexity and optimizing computational resources. The difference lies in the design of two prediction branches in the final layer of this network. These branches consist of several stacked convolution blocks, each predicting the low-frequency and high-frequency components of the shadow-free image, respectively. Finally, high-quality shadow-free images are reconstructed using inverse wavelet transform.
Result
2
The experiments were conducted on three shadow removal datasets for training and testing. On the shadow removal dataset (SRD) dataset, comparisons were made with nine state-of-the-art shadow removal algorithms. The proposed model achieved the best or second-best results in terms of peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and root mean square error (RMSE) in both non-shadow regions and the entire image. On the dataset with image shadow triplets dataset (ISTD), the performance was the best in non-shadow regions, with improvements of 0.36 dB in PSNR, 0.004 in SSIM, and 0.04 in RMSE compared with the second-best model. It ranked second in performance across all metrics for the entire image. On the augmented dataset with image shadow triplets (ISTD+), compared with six state-of-the-art shadow removal algorithms, the performance was the best in non-shadow regions, with improvements of 0.47 dB in PSNR and 0.1 in RMSE. Additionally, regarding the advanced shadow removal diffusion model ShadowDiffusion, the RMSE for the entire image was 3.63 on the SRD dataset when generating images of 256 × 256 pixels resolution. However, a significant performance drop occurred when generating images of the original resolution of 840 × 640 pixels, with RMSE increasing to 7.19. By contrast, our approach yielded RMSE values of 3.80 and 4.06 for images of dimensions 256 × 256 pixels and 840 × 640 pixels, respectively, showcasing consistent performance. Additionally, the time required to generate a single original image of size 840 × 640 pixels was reduced by roughly fourfold compared with ShadowDiffusion. Furthermore, our method was expanded to address image raindrop removal tasks, delivering competitive results on the RainDrop dataset.
Conclusion
2
In this paper, the proposed method accelerates the sampling time of the diffusion model. While removing shadows, it restores missing color, brightness, and rich details in shadow regions. It treats shadow removal as an image generation task in the wavelet domain and designs two adaptive diffusion flows for the low-frequency and high-frequency components of the image wavelet domain to address the degradation of low-frequency (color, brightness) and high-frequency detail information caused by shadow images. Benefiting from the frequency decomposition of the wavelet transform, WNDM does not learn from the entangled pixel space domain but effectively separates and trains them separately, thereby generating more refined low-frequency and high-frequency information for reconstructing the final image. Extensive experiments on multiple datasets demonstrate the effectiveness of WNDM, achieving competitive results compared with state-of-the-art methods.
阴影去除扩散模型(DM)小波变换双分支网络噪声调度表
shadow removaldiffusion model(DM)wavelet transformdual-branch networknoise schedule
Chen T. 2023. On the importance of noise scheduling for diffusion models [EB/OL]. [2024-01-08]. https://arxiv.org/pdf/2301.10972.pdfhttps://arxiv.org/pdf/2301.10972.pdf
Cun X, Pun C M and Shi C. 2020. Towards ghost-free shadow removal via dual hierarchical aggregation network and shadow matting GAN//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI Press: 10680-10687 [DOI: 10.1609/aaai.v34i07.6695http://dx.doi.org/10.1609/aaai.v34i07.6695]
Finlayson G D, Hordley S D, Lu C and Drew M S. 2006. On the removal of shadows from images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1): 59-68 [DOI: 10.1109/TPAMI.2006.18http://dx.doi.org/10.1109/TPAMI.2006.18]
Fu L, Zhou C Q, Guo Q, Juefei-Xu F, Yu H K, Feng W, Liu Y and Wang S. 2021. Auto-exposure fusion for single-image shadow removal//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 10566-10575 [DOI: 10.1109/cvpr46437.2021.01043http://dx.doi.org/10.1109/cvpr46437.2021.01043]
Guo L Q, Huang S Y, Liu D, Cheng H and Wen B H. 2023a. ShadowFormer: global context helps image shadow removal [EB/OL]. [2024-01-08]. https://arxiv.org/pdf/2302.01650.pdfhttps://arxiv.org/pdf/2302.01650.pdf
Guo L Q, Wang C, Yang W H, Huang S Y, Wang Y F, Pfister H and Wen B H. 2023b. ShadowDiffusion: when degradation prior meets diffusion model for shadow removal//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 14049-14058 [DOI: 10.1109/cvpr52729.2023.01350http://dx.doi.org/10.1109/cvpr52729.2023.01350]
Guo R Q, Dai Q Y and Hoiem D. 2013. Paired regions for shadow detection and removal. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12): 2956-2967 [DOI: 10.1109/TPAMI.2012.214http://dx.doi.org/10.1109/TPAMI.2012.214]
Haar A. 1911. Zur theorie der orthogonalen funktionensysteme. Mathematische Annalen, 71(1): 38-53 [DOI: 10.1007/bf01456927http://dx.doi.org/10.1007/bf01456927]
Ho J, Jain A and Abbeel P. 2020. Denoising diffusion probabilistic models//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 6840-6851
Hu M D, Wu Y, Song Y, Yang J B, Zhang R F, Wang H and Meng D Y. 2022. The integrated evaluation and review of single image rain removal based datasets and deep learning methods. Journal of Image and Graphics, 27(5): 1359-1391
胡明娣, 吴怡, 宋尧, 杨静冰, 张瑞芳, 王红, 孟德宇. 2022. 单幅图像去雨数据集和深度学习算法的联合评估与展望. 中国图象图形学报, 27(5): 1359-1391 [DOI: 10.11834/jig.211153http://dx.doi.org/10.11834/jig.211153]
Hu X W, Fu C W, Zhu L, Qin J and Heng P A. 2020. Direction-aware spatial context features for shadow detection and removal. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(11): 2795-2808 [DOI: 10.1109/TPAMI.2019.2919616http://dx.doi.org/10.1109/TPAMI.2019.2919616]
Hu X W, Jiang Y T, Fu C W and Heng P A. 2019. Mask-shadowGAN: learning to remove shadows from unpaired data//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 2472-2481 [DOI: 10.1109/iccv.2019.00256http://dx.doi.org/10.1109/iccv.2019.00256]
Jiang H, Luo A, Fan H Q, Han S C and Liu S C. 2023. Low-light image enhancement with wavelet-based diffusion models. ACM Transactions on Graphics (TOG), 42(6): #238 [DOI: 10.1145/3618373http://dx.doi.org/10.1145/3618373]
Jin Y Y, Sharma A and Tan R T. 2021. Dc-ShadowNet: single-image hard and soft shadow removal using unsupervised domain-classifier guided network//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 5007-5016 [DOI: 10.1109/iccv48922.2021.00498http://dx.doi.org/10.1109/iccv48922.2021.00498]
Khan S H, Bennamoun M, Sohel F and Togneri R. 2016. Automatic shadow detection and removal from a single image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(3): 431-446 [DOI: 10.1109/TPAMI.2015.2462355http://dx.doi.org/10.1109/TPAMI.2015.2462355]
Le H and Samaras D. 2019. Shadow removal via shadow image decomposition//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 8577-8586 [DOI: 10.1109/iccv.2019.00867http://dx.doi.org/10.1109/iccv.2019.00867]
Li H Y, Yang Y F, Chang M, Chen S Q, Feng H J, Xu Z H, Li Q and Chen Y T. 2022. SRDiff: single image super-resolution with diffusion probabilistic models. Neurocomputing, 479: 47-59 [DOI: 10.1016/j.neucom.2022.01.029http://dx.doi.org/10.1016/j.neucom.2022.01.029]
Liu F and Gleicher M. 2008. Texture-consistent shadow removal//Proceedings of the 10th European Conference on Computer Vision. Marseille, France: Springer: 437-450 [DOI: 10.1007/978-3-540-88693-8_32http://dx.doi.org/10.1007/978-3-540-88693-8_32]
Long C J and Hua G. 2017. Correlational Gaussian processes for cross-domain visual recognition//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 4932-4940 [DOI: 10.1109/CVPR.2017.524http://dx.doi.org/10.1109/CVPR.2017.524]
Lugmayr A, Danelljan M, Van Gool L and Timofte R. 2020. SRFlow: learning the super-resolution space with normalizing flow//Proceedings of the 16th European Conference on Computer Vision — ECCV 2020. Glasgow, UK: Springer: 715-732 [DOI: 10.1007/978-3-030-58558-7_42http://dx.doi.org/10.1007/978-3-030-58558-7_42]
Mikic I, Cosman P C, Kogut G T and Trivedi M M. 2000. Moving shadow and object detection in traffic scenes//Proceedings of the 15th International Conference on Pattern Recognition. Barcelona, Spain: IEEE: 321-324 [DOI: 10.1109/ICPR.2000.905341http://dx.doi.org/10.1109/ICPR.2000.905341]
Nichol A Q and Dhariwal P. 2021. Improved denoising diffusion probabilistic models [EB/OL]. [2024-01-08]. https://arxiv.org/pdf/2102.09672.pdfhttps://arxiv.org/pdf/2102.09672.pdf
Özdenizci O and Legenstein R. 2023. Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8): 10346-10357 [DOI: 10.1109/TPAMI.2023.3238179http://dx.doi.org/10.1109/TPAMI.2023.3238179]
Phung H, Dao Q and Tran A. 2023. Wavelet diffusion models are fast and scalable image generators//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 10199-10208 [DOI: 10.1109/cvpr52729.2023.00983http://dx.doi.org/10.1109/cvpr52729.2023.00983]
Qian R, Tan R T, Yang W H, Su J J and Liu J Y. 2018. Attentive generative adversarial network for raindrop removal from a single image//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2482-2491 [DOI: 10.1109/cvpr.2018.00263http://dx.doi.org/10.1109/cvpr.2018.00263]
Qu L Q, Tian J D, He S F, Tang Y D and Lau R W H. 2017. DeshadowNet: a multi-context embedding deep network for shadow removal//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2308-2316 [DOI: 10.1109/CVPR.2017.248http://dx.doi.org/10.1109/CVPR.2017.248]
Quan R J, Yu X, Liang Y Z and Yang Y. 2021. Removing raindrops and rain streaks in one go//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 9143-9152 [DOI: 10.1109/cvpr46437.2021.00903http://dx.doi.org/10.1109/cvpr46437.2021.00903]
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Saharia C, Chan W, Chang H W, Lee C, Ho J, Salimans T, Fleet D and Norouzi M. 2022. Palette: image-to-image diffusion models//Proceedings of 2022 ACM SIGGRAPH Conference. Vancouver, Canada: ACM: #15 [DOI: 10.1145/3528233.3530757http://dx.doi.org/10.1145/3528233.3530757]
Saharia C, Ho J, Chan W, Salimans T, Fleet D J and Norouzi M. 2023. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4): 4713-4726 [DOI: 10.1109/TPAMI.2022.3204461http://dx.doi.org/10.1109/TPAMI.2022.3204461]
Sohl-Dickstein J, Weiss E A, Maheswaranathan N and Ganguli S. 2015. Deep unsupervised learning using nonequilibrium thermodynamics//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille, France: JMLR.org: 2256-2265
Song J M, Meng C L and Ermon S. 2021. Denoising diffusion implicit models [EB/OL]. [2024-01-08]. https://arxiv.org/pdf/2010.02502v2.pdfhttps://arxiv.org/pdf/2010.02502v2.pdf
Tang L, Zhao C X, Wang H N and Shao W Z. 2008. Shadow removal for road surface images based on anisotropic diffusion Retinex. Journal of Image and Graphics, 13(2): 264-268
唐磊, 赵春霞, 王鸿南, 邵文泽. 2008. 基于各向异性Retinex的路面图像阴影消除. 中国图象图形学报, 13(2): 264-268 [DOI: 10.11834/jig.20080215http://dx.doi.org/10.11834/jig.20080215]
Wang J F, Li X and Yang J. 2018. Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1788-1797 [DOI: 10.1109/CVPR.2018.00192http://dx.doi.org/10.1109/CVPR.2018.00192]
Wang K F, Gou C, Duan Y J, Lin Y L, Zheng X H and Wang F Y. 2017. Generative adversarial networks: introduction and outlook. IEEE/CAA Journal of Automatica Sinica, 4(4): 588-598 [DOI: 10.1109/JAS.2017.7510583http://dx.doi.org/10.1109/JAS.2017.7510583]
Whang J, Delbracio M, Talebi H, Saharia C, Dimakis A G and Milanfar P. 2022. Deblurring via stochastic refinement//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 16272-16282 [DOI: 10.1109/CVPR52688.2022.01581http://dx.doi.org/10.1109/CVPR52688.2022.01581]
Xiao J, Fu X Y, Liu A P, Wu F and Zha Z J. 2023. Image De-raining Transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(11): 12978-12995 [DOI: 10.1109/tpami.2022.3183612http://dx.doi.org/10.1109/tpami.2022.3183612]
Yi X P, Xu H, Zhang H, Tang L F and Ma J Y. 2023. Diff-retinex: rethinking low-light image enhancement with a generative diffusion model [EB/OL]. [2024-01-08]. https://arxiv.org/pdf/2308.13164.pdfhttps://arxiv.org/pdf/2308.13164.pdf
Yue Z S, Wang J Y and Loy C C. 2023. ResShift: efficient diffusion model for image super-resolution by residual shifting [EB/OL]. [2024-01-08]. https://arxiv.org/pdf/2307.12348.pdfhttps://arxiv.org/pdf/2307.12348.pdf
Zhu Y R, Huang J, Fu X Y, Zhao F, Sun Q B and Zha Z J. 2022a. Bijective mapping network for shadow removal//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 5617-5626 [DOI: 10.1109/cvpr52688.2022.00554http://dx.doi.org/10.1109/cvpr52688.2022.00554]
Zhu Y R, Xiao Z Y, Fang Y C, Fu X Y, Xiong Z W and Zha Z J. 2022b. Efficient model-driven network for shadow removal//Proceedings of the 36th AAAI Conference on Artificial Intelligence. Virtually: AAAI Press: 3635-3643 [DOI: 10.1609/aaai.v36i3.20276http://dx.doi.org/10.1609/aaai.v36i3.20276]
相关作者
相关机构