面向透射文档图像复原的模糊扩散模型
FDM: Fuzzy Diffusion Model for Seen-through Document Images Restoration
- 2024年 页码:1-13
网络出版日期: 2024-09-18
DOI: 10.11834/jig.240350
移动端阅览
浏览全部资源
扫码关注微信
网络出版日期: 2024-09-18 ,
移动端阅览
王义杰,龚嘉鑫,梁宗宝等.面向透射文档图像复原的模糊扩散模型[J].中国图象图形学报,
Wang Yijie,Gong Jiaxin,Liang Zongbao,et al.FDM: Fuzzy Diffusion Model for Seen-through Document Images Restoration[J].Journal of Image and Graphics,
目的
2
在对文档进行数码成像时,墨水浓度和材质透明度等因素可能会导致文档背面内容透过纸张变得可见,从而导致数字图像中出现透射现象,影响文档图像的实际使用。针对这一现象,本文提出了一种模糊扩散模型,基于模糊逻辑的均值回归思想,不需要任何先验知识,增强扩散模型处理文档图像中随机因素的能力,不仅解决了文档图像的透射现象,而且增强了图像的视觉效果。
方法
2
本文所提方法通过均值回归随机微分方程连续添加随机噪声降低原始图像质量,将其转换为带有固定高斯噪声的透射均值状态,随后在噪声网络中引入模糊逻辑操作来推理每个像素点的隶属度关系,使模型更好地学习噪声和数据分布,在逆向过程中,通过模拟相应的逆时间随机微分方程来逐渐恢复低质量图像,获得干净的无透射图像。
结果
2
将所提算法分别在合成灰度数据集和合成彩色数据集上进行训练,并在3个合成数据集和2个真实数据集上进行测试,与现有代表性的5种方法进行了比较,所提出的方法取得了最好的视觉效果,且在一定程度上消除了原始图像中的噪声。在峰值信噪比(peak signal-to-noise ratio, PSNR)、结构相似性(Structural Similarity Index, SSIM)、学习感知图像块相似度(learned perceptual image patch similarity, LPIPS)和费雷歇初始距离(Fréchet inception distance, FID)四个评价指标上均取得了最好的结果。
结论
2
本文方法能够有效地解决不同类型文档图像中的透射现象,提高了文档图像去透射任务的准确性和效率,有望集成到各种摄像头、扫描仪等实际硬件设备。
Objective
2
Document images have significant applications across various fields such as OCR recognition, historical document restoration, and electronic reading. However, while scanning or shooting a document, factors like ink density and paper transparency may cause the content from the reverse side to become visible through the paper, resulting in a digital image with a `seen-through' phenomenon, which will affect practical applications. Additionally, the image acquisition process is often affected by various sources of uncertainty, including differences in camera equipment performance, paper quality, lighting conditions, lens shake, and variations in the physical properties of the documents themselves. These random factors contribute to the noise in document images and may complicate the seen-through phenomenon, thereby impacting subsequent tasks such as text recognition, word identification, and layout analysis. It's worth noting that while restoring the content of document images is important, the backgrounds of many color document images also provide valuable information. Therefore, recovering color images with complex backgrounds affected by the seen-through phenomenon presents its own challenges. While existing methods for removing seen-through effects from document images have made progress in improving image quality, algorithms specifically tailored to handle variations in the degree of seen-through effects, complex background colors, and the influence of uncertainty factors have not yet been developed. Addressing these issues, this paper aims to develop a comprehensive algorithm for addressing the diverse seen-through problems in regular document images, handwritten document images, and color document images. Consequently, we propose Fuzzy Diffusion Model (FDM) that integrates fuzzy logic with conditional diffusion models. The objective of this algorithm is to restore document images affected by various types and degrees of seen-through phenomena, introducing a novel approach to document image enhancement and restoration.
Method
2
The overall process of this algorithm can be divided into a forward diffusion process and a corresponding reverse denoising process. Firstly, we gradually add continuous Gaussian noise to the input image using mean-reverting stochastic differential equations (SDE), resulting in a seen-through mean state with fixed Gaussian noise. Subsequently, we train a neural network to progressively predict the noise at the current time step from the image with added noise and estimate the score function based on the predicted noise. Finally, in the reverse process, we gradually restore the low-quality image by simulating the corresponding reverse-time stochastic differential equation until a clean image without seen-through effects is generated. To address the uncertainty factors in document images, we specifically design a fuzzy block in the skip connection part of the noise network to compute the affiliation of each pixel point in the image. Specifically, the fuzzy operation uses 9 surrounding pixels including the pixel itself, and the final affiliation of the pixel is obtained after fuzzy inference. It is worth noting that we draw inspiration from the U-Net structure in denoising diffusion probabilistic model (DDPM), with the difference that we remove all group normalization layers and self-attention layers to improve inference efficiency. In the middle part, we introduce Atrous Spatial Pyramid Pooling (ASPP) specifically to maximize the expansion of the receptive field and extract richer features. Finding matching pairs of seen-through images in the real world is challenging, so we propose a new protocol for synthetic seen-through images. During training, we input seen-through images as conditional information along with the noise-added images into the noise network to allow the model to learn the target distribution directionally. After the model is trained, seen-through images are used as conditional input to progressively predict the noise in the noise image, generating clear document images.
Result
2
We trained our model separately on the synthetic grayscale dataset and the synthetic color dataset, and tested it on three synthetic datasets and two real datasets. The test sets include synthetic grayscale document images, synthetic color document images, synthetic handwritten document images, the Media Team Oulu document dataset, and real CET-6 seen-through document images. We compared our method with five representative existing methods, and the proposed method achieved the best visual effects, effectively eliminating the noise present in the original images to some extent. Our method achieved the best results on the four evaluation metrics: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Learned Perceptual Image Patch Similarity (LPIPS), and Fréchet Inception Distance (FID). Compared to the methods being compared, our method achieved the best PSNR and FID values on the grayscale dataset, with values of 35.05 and 30.69, respectively. On the synthetic color dataset, we obtained the highest SSIM, LPIPS, and FID values, with values of 0.986, 0.0053, and 20.03, respectively. To validate the stability of the proposed method, we also provided the variance values when evaluating the SSIM metric, achieving the best result of 0.0053.
Conclusion
2
The proposed FDM effectively addresses various challenges in the task of removing seen-through effects from document images, including the lack of paired seen-through document images, residual seen-through effects, difficulty in handling complex backgrounds, and addressing uncertainty factors in images uniformly. As a result, it can effectively address the phenomenon of seen-through in different types of document images, enhancing the accuracy and efficiency of the task of removing seen-through from document images. It is expected to be integrated into various practical hardware devices such as cameras and scanners.
扩散模型模糊逻辑图像复原透射去除随机微分方程
Diffusion modelsfuzzy logicimage restorationseen-through removalstochastic differential equations
Anderson B D O. 1982. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3): 313-326 [DOI: 10.1016/0304-4149(82)90051-5http://dx.doi.org/10.1016/0304-4149(82)90051-5]
De R, Chakraborty A and Sarkar R. 2020. Document image binarization using dual discriminator generative adversarial networks. IEEE Signal Processing Letters, 27: 1090-1094 [DOI: 10.1109/lsp.2020.3003828http://dx.doi.org/10.1109/lsp.2020.3003828]
Fei B, Lyu Z, Pan L, Zhang J, Yang W, Luo T, Zhang B and Dai B. 2023. Generative diffusion prior for unified image restoration and enhancement//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE: 9935-9946 [DOI: 10.1109/CVPR52729.2023.00958http://dx.doi.org/10.1109/CVPR52729.2023.00958]
Feng X, Pei W, Jia Z, Chen F, Zhang D and Lu D. 2021. Deep-masking generative network: A unified framework for background restoration from superimposed images. IEEE Transactions on Image Processing, 30: 4867-4882 [DOI: 10.1109/TIP.2021.3076589http://dx.doi.org/10.1109/TIP.2021.3076589]
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. 139-144 [DOI: 10.3156/JSOFT.29.5_177_2http://dx.doi.org/10.3156/JSOFT.29.5_177_2]
He S and Schomaker L. 2019. DeepOtsu: Document enhancement and binarization using iterative deep learning. Pattern Recognition, 91: 379-390 [DOI: 10.1016/j.patcog.2019.01.025http://dx.doi.org/10.1016/j.patcog.2019.01.025]
Heusel M, Ramsauer H, Unterthiner T, Nessler B. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium//Proceedings of the 31st International Conference on Neural Information Processing Systems. 6629-6640 [DOI: https://doi.org/10.48550/arXiv.1706.08500http://dx.doi.org/https://doi.org/10.48550/arXiv.1706.08500]
Ho J, Jain A and Abbeel P. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 6840-6851
Hyvärinen A and Dayan P. 2005. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4) [DOI: 10.1007/978-1-84882-491-1_21]
Jemni S K, Souibgui M A, Kessentini Y and Fornés A. 2022. Enhance to read better: a multi-task adversarial network for handwritten document image enhancement. Pattern Recognition, 123: 108370 [DOI: 10.1016/j.patcog.2021.108370http://dx.doi.org/10.1016/j.patcog.2021.108370]
Kingma D, Salimans T, Poole B and Ho J. 2021. Variational diffusion models//Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook, NY, USA: 21696-21707 [DOI: https://doi.org/10.48550/arXiv.2107.00630http://dx.doi.org/https://doi.org/10.48550/arXiv.2107.00630]
Kingma D P and Welling M. 2014. Auto-encoding variational bayes// Proceedings of the 2nd International Conference on Learning Representations. [DOI: https://doi.org/10.48550/arXiv.1312.6114http://dx.doi.org/https://doi.org/10.48550/arXiv.1312.6114]
Li H, Yang Y, Chang M, Feng H, Xu Z, Li Q and Chen Y. 2022. Srdiff: Single image super-resolution with diffusion probabilistic models Neurocomputing, 479: 47-59 [DOI: 10.1016/j.neucom.2022.01.029http://dx.doi.org/10.1016/j.neucom.2022.01.029]
Li Y, Liu M, Yi Y, Li Q, Ren D and Zuo W. 2023. Two-stage single image reflection removal with reflection-aware guidance. Applied Intelligence, 53(16): 19433-19448 [DOI: https://doi.org/10.1007/s10489-022-04391-6http://dx.doi.org/https://doi.org/10.1007/s10489-022-04391-6]
Liu Y, Liu F, Ke Z, Zhao N and Lau R W. 2024. Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE [DOI: 10.48550/arXiv.2403.00644http://dx.doi.org/10.48550/arXiv.2403.00644]
Liu Y, Zhao Q, Pan F, Gao D and Danzeng P. 2023. Structure prior guided text image inpainting model. Journal of Image and Graphics, 28(12):3699-3712
刘雨轩,赵启军,潘帆,高定国,普布旦增. 2023. 结构先验指导的文本图像修复模型. 中国图象图形学报, 28(12):3699-3712 [DOI: 10.11834/jig.220960http://dx.doi.org/10.11834/jig.220960]
Luo Z, Gustafsson F K, Zhao Z, Sjölund J and Schön T B. 2023. Image restoration with mean-reverting stochastic differential equations//Proceedings of the 40th International Conference on Machine Learning. ACM: 23045-23066 [DOI: https://doi.org/10.48550/arXiv.2301.11699http://dx.doi.org/https://doi.org/10.48550/arXiv.2301.11699]
Mehri A, Ardakani P B and Sappa A D. 2021. MPRNet: Multi-path residual network for lightweight image super resolution//Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa, HI, USA: IEEE: 2704-2713 [DOI: 10.1109/WACV48630.2021.00275http://dx.doi.org/10.1109/WACV48630.2021.00275]
Moghaddam R F and Cheriet M. 2009. A variational approach to degraded document enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8): 1347-1361 [DOI: 10.1109/TPAMI.2009.141http://dx.doi.org/10.1109/TPAMI.2009.141]
Özdenizci O and Legenstein R. 2023. Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8): 10346-10357 [DOI: 10.1109/TPAMI.2023.3238179http://dx.doi.org/10.1109/TPAMI.2023.3238179]
Quan Y, Yang J, Chen Y, Xu Y and Ji H. 2020. Collaborative deep learning for super-resolving blurry text images. IEEE Transactions on Computational Imaging, 6: 778-790 [DOI: 10.1109/TCI.2020.2981758http://dx.doi.org/10.1109/TCI.2020.2981758]
Rombach R, Blattmann A, Lorenz D, Esser P and Ommer B. 2022. High-resolution image synthesis with latent diffusion models//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. New Orleans, LA, USA: IEEE: 10684-10695 [DOI: 10.1109/CVPR52688.2022.01042http://dx.doi.org/10.1109/CVPR52688.2022.01042]
Rowley-Brooke R, Pitié F and Kokaram A. 2013. A non-parametric framework for document bleed-through removal//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE: 2954-2960 [DOI: 10.1109/CVPR.2013.380http://dx.doi.org/10.1109/CVPR.2013.380]
Saharia C, Ho J, Chan W, Salimans T, Fleet D J and Norouzi M. 2022. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4): 4713-4726. [DOI: 10.1109/TPAMI.2022.3204461http://dx.doi.org/10.1109/TPAMI.2022.3204461]
Sauvola J and Kauniskangas H. 1999. MediaTeam Document Database II, a CD-ROM collection of document images. University of Oulu, Finland.
Shen Y, Wei M, Wang Y, Fu X and Qin J. 2024. Rethinking Real-world Image Deraining via An Unpaired Degradation-Conditioned Diffusion Model//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE [DOI: https://doi.org/10.48550/arXiv.2301.09430http://dx.doi.org/https://doi.org/10.48550/arXiv.2301.09430]
Silva J M, Lins R D and Silva G P. 2009. Enhancing the quality of color documents with back-to-front interference//Proceedings of the 6th Image Analysis and Recognition: International Conference. Halifax, Canada: Springer Berlin Heidelberg: 875-885 [DOI: 10.1007/978-3-642-02611-9_86http://dx.doi.org/10.1007/978-3-642-02611-9_86]
Song J, Meng C, Ermon S. 2020. Denoising diffusion implicit models.// Proceedings of the International Conference on Learning Representations. [DOI: https://doi.org/10.48550/arXiv.2010.02502http://dx.doi.org/https://doi.org/10.48550/arXiv.2010.02502]
Song Y and Ermon S. 2019. Generative modeling by estimating gradients of the data distribution.//Proceedings of the Neural Information Processing Systems. 11918-11930 [DOI: https://doi.org/10.48550/arXiv.1907.05600http://dx.doi.org/https://doi.org/10.48550/arXiv.1907.05600]
Song Y and Ermon S. 2020. Improved techniques for training score-based generative models//Proceedings of the Neural Information Processing Systems, 12438-12448 [DOI: https://doi.org/10.48550/arXiv.2006.09011http://dx.doi.org/https://doi.org/10.48550/arXiv.2006.09011]
Song Y, Garg S, Shi J, Shi J and Ermon S. 2020. Sliced score matching: A scalable approach to density and score estimation//Proceedings of the 35th Uncertainty in Artificial Intelligence. PMLR: 574-584 [DOI: https://doi.org/10.48550/arXiv.1905.07088http://dx.doi.org/https://doi.org/10.48550/arXiv.1905.07088]
Song Y, Sohl-Dickstein J, Kingma D P, Kumar A, Ermon S and Poole B. 2021. Score-based generative modeling through stochastic differential equations//Proceedings of the International Conference on Learning Representations. [DOI: https://doi.org/10.48550/arXiv.2011.13456http://dx.doi.org/https://doi.org/10.48550/arXiv.2011.13456]
Souibgui M A, Biswas S, Jemni S K, Kessentini Y, Fornés A, Lladós J and Pal U. 2022. Docentr: An end-to-end document image enhancement transformer//Proceedings of the 2022 26th International Conference on Pattern Recognition (ICPR). India: IEEE: 1699-1705 [DOI: 10.1109/ICPR56361.2022.9956101http://dx.doi.org/10.1109/ICPR56361.2022.9956101]
Souibgui M A and Kessentini Y. 2020. De-gan: A conditional generative adversarial network for document enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3): 1180-1191 [DOI: 10.1109/TPAMI.2020.3022406http://dx.doi.org/10.1109/TPAMI.2020.3022406]
Sun B, Li S, Zhang X P and Sun J. 2016. Blind bleed-through removal for scanned historical document image with conditional random fields. IEEE Transactions on Image Processing, 25(12): 5702-5712 [DOI: 10.1109/TIP.2016.2614133http://dx.doi.org/10.1109/TIP.2016.2614133]
Sun X, Xu J, Ma Y, Zhao T, Ou S and Peng L. 2022. Blind image separation based on attentional generative adversarial network. Journal of Ambient Intelligence and Humanized Computing, 13(3): 1397-1404 [DOI: 10.1007/s12652-020-02637-0http://dx.doi.org/10.1007/s12652-020-02637-0]
Tan C L, Cao R and Shen P. 2002. Restoration of archival documents using a wavelet technique. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(10): 1399-1404 [DOI: 10.1109/TPAMI.2002.1039211http://dx.doi.org/10.1109/TPAMI.2002.1039211]
Tonazzini A, Salerno E and Bedini L. 2006. Fast correction of bleed-through distortion in grayscale documents by a blind source separation technique. International Journal of Document Analysis and Recognition (IJDAR), 10(1): 17-25 [DOI: 10.1107/s10032-006-0015-zhttp://dx.doi.org/10.1107/s10032-006-0015-z]
Whang J, Delbracio M, Talebi H, Saharia C, Dimakis A G and Milanfar P. 2022. Deblurring via stochastic refinement//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE: 16293-16303 [DOI: 10.1109/CVPR52688.2022.01581http://dx.doi.org/10.1109/CVPR52688.2022.01581]
Wang Z, Bovik A C, Sheikh H R and Simoncelli E P. 2004. Image quality assessment: Form error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 604-606 [DOI: 10.1109/TIP.2003.819861http://dx.doi.org/10.1109/TIP.2003.819861]
Wang Z, Huo G, Lan H, Hu J and Wei X. 2024. Fundus image enhancement algorithm based on convolutional dictionary diffusion model. Journal of Image and Graphics, 29(08):2426-2438
王珍,霍光磊,兰海,胡建民,魏宪. 2024. 基于卷积字典扩散模型的眼底图像增强算法. 中国图象图形学报, 29(08):2426-2438 [DOI: 10.11834/jig.230595http://dx.doi.org/10.11834/jig.230595]
Wolf C. 2009. Document ink bleed-through removal with two hidden markov random fields and a single observation field. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3): 431-447 [DOI: 10.1109/TPAMI.2009.33http://dx.doi.org/10.1109/TPAMI.2009.33]
Xu J, Ma Y, Liang Z and Ni M, 2023. Single bleed-through image restoration with self-supervised learning, Acta Autom. Sin, 49 (1): 219–228
徐金东,马咏莉,梁宗宝,倪梦莹. 2023. 自监督学习的单幅透射图像恢复.自动化学报, 49(1):219-228 [DOI:10.16383/j.aas.c220165http://dx.doi.org/10.16383/j.aas.c220165]
Yang Z, Liu B, Xiong Y, Yi L, Wu G, Tang X, Liu Z, Zhou J and Zhang X. 2023. DocDiff: Document enhancement via residual diffusion models//Proceedings of the 31st ACM International Conference on Multimedia. Canada: ACM: 2795-2806 [DOI: 10.1145/3581783.3611730http://dx.doi.org/10.1145/3581783.3611730]
Zhang L, Rao A and Agrawala M. 2023. Adding Conditional Control to Text-to-Image Diffusion Models//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. IEEE: 3836-3847 [DOI: 10.1109/ICCV51070.2023.00355http://dx.doi.org/10.1109/ICCV51070.2023.00355]
Zhang J, Rimchala J, Mouatadid L, Mouatadid L, Das K and Kumar S. 2024. DECDM: Document Enhancement using Cycle-Consistent Diffusion Models//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE: 8036-8045 [DOI: 10.1109/WACV57701.2024.00785http://dx.doi.org/10.1109/WACV57701.2024.00785]
Zhao J, Shi C, Jia F, Wang Y and Xiao B. 2019. Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recognition, 96: 106968 [DOI: 10.1016/j.patcog.2019.106968http://dx.doi.org/10.1016/j.patcog.2019.106968]
Zhang G, Ji J, Zhang Y, Yu M, Jaakkola T and Chang S. 2023. Towards coherent image inpainting using denoising diffusion implicit models//Proceedings of the International Conference on Machine Learning. US: ACM: 41164-41193 [DOI: https://doi.org/10.48550/arXiv.2304.03322http://dx.doi.org/https://doi.org/10.48550/arXiv.2304.03322]
Zhang R, Isola P, Efros A A, Shechtman E and Wang O. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual metric//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 586-595 [DOI: 10.1109/CVPR.2018.00068].
Zheng D, Wu X M, Yang S, Zhang J, Hu J and Zheng W. 2024. Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE [DOI: https://doi.org/10.48550/arXiv.2403.11157http://dx.doi.org/https://doi.org/10.48550/arXiv.2403.11157]
Zheng Q, Shi B, Chen J, Jiang X, Duan L Y and Kot A C. 2021. Single image reflection removal with absorption effect//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA: IEEE: 13395-13404 [DOI: 10.1109/CVPR46437.2021.01319http://dx.doi.org/10.1109/CVPR46437.2021.01319]
Zou Z, Lei S, Shi T, Shi Z and Ye J. 2020. Deep adversarial decomposition: A unified framework for separating superimposed images//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE: 12806-12816 [DOI: 10.1109/CVPR42600.2020.01282http://dx.doi.org/10.1109/CVPR42600.2020.01282]
相关作者
相关机构