基于卷积字典扩散模型的眼底图像增强算法
Fundus image enhancement algorithm based on convolutional dictionary diffusion model
- 2024年29卷第8期 页码:2426-2438
纸质出版日期: 2024-08-16
DOI: 10.11834/jig.230595
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-08-16 ,
移动端阅览
王珍, 霍光磊, 兰海, 胡建民, 魏宪. 2024. 基于卷积字典扩散模型的眼底图像增强算法. 中国图象图形学报, 29(08):2426-2438
Wang Zhen, Huo Guanglei, Lan Hai, Hu Jianmin, Wei Xian. 2024. Fundus image enhancement algorithm based on convolutional dictionary diffusion model. Journal of Image and Graphics, 29(08):2426-2438
目的
2
视网膜眼底图像广泛用于临床筛查和诊断眼科疾病,但由于散焦、光线条件不佳等引起的眼底图像模糊,导致医生无法正确诊断,且现有图像增强方法恢复的图像仍存在模糊、高频信息缺失以及噪点增多问题。本文提出了一个卷积字典扩散模型,将卷积字典学习的去噪能力与条件扩散模型的灵活性相结合,从而解决了上述问题。
方法
2
算法主要包括两个过程:扩散过程和去噪过程。首先向输入图像中逐步添加随机噪声,得到趋于纯粹噪声的图像;然后训练一个神经网络逐渐将噪声从图像中移除,直到获得一幅清晰图像。本文利用卷积网络来实现卷积字典学习并获取图像稀疏表示,该算法充分利用图像的先验信息,有效避免重建图像高频信息缺失和噪点增多的问题。
结果
2
将本文模型在EyePACS数据集上进行训练,并分别在合成数据集DRIVE (dgital retinal images for vessel extraction)、CHASEDB1(child heart and health study in England)、ROC(retinopathy online challenge)和真实数据集RF(real fundus)、HRF(high-resolution fundus)上进行测试,验证了所提方法在图像增强任务上的性能及跨数据集的泛化能力,其评价指标峰值信噪比(peak signal-to-noise ratio,PSNR)和学习感知图像块相似度(learned perceptual image patch similarity,LPIPS)与原始扩散模型(learning enhancement from degradation,Led)相比平均分别提升了1.992 9 dB和0.028 9。此外,将本文方法用于真实眼科图像下游任务的前处理能够有效提升下游任务的表现,在含有分割标签的DRIVE数据集上进行的视网膜血管分割实验结果显示,相较于原始扩散模型,其分割指标对比其受试者工作特征曲线下面积(area under the curve,AUC),准确率(accuracy,Acc)和敏感性(sensitivity,Sen)平均分别提升0.031 4,0.003 0和0.073 8。
结论
2
提出的方法能够在保留真实眼底特征的同时去除模糊、恢复更丰富的细节,从而有利于临床图像的分析和应用。
Objective
2
Retinal fundus images have important clinical applications in ophthalmology. These images can be used to screen and diagnose various ophthalmic diseases, such as diabetic retinopathy, macular degeneration, and glaucoma. However, the acquisition of these images is often affected by various factors in real scenarios, including lens defocus, poor ambient light conditions, patient eye movements, and camera performance. These issues often lead to quality problems such as blurriness, unclear details, and inevitable noise in fundus images. Such poor-quality images pose a challenge to ophthalmologists in their diagnostic work. For example, blurred images will lead to the absence of detailed information about the morphological structure of the retina, which causes difficulty for the physicians to accurately localize and identify abnormalities, lesions, exudations, and other conditions. Existing enhancement methods for fundus images have progressed in improving image quality. However, some problems still exist, such as image blurring, artifacts, missing high-frequency information, and increased noise. Therefore, in this study, we propose a convolutional dictionary diffusion model, which combines convolutional dictionary learning with conditional diffusion model. This algorithm aims to cope with the abovementioned problems of low-quality images to provide an effective tool for fundus image enhancement. Our approach can improve the quality of fundus images and enable physicians to increase diagnostic confidence, improve assessment accuracy, monitor treatment progress, and ensure better care for patients. This method will contribute to ophthalmic research and provide more opportunities for prospective healthcare management and medical intervention, which positively impacts patients’ ocular health and overall quality of life.
Method
2
The algorithm consists of two parts: simulation of diffusion process and inverse denoising process. First, random noise is gradually added to the input image to obtain a purely noisy image. Then, a neural network is trained to gradually remove the noise from the image until a clear image is finally obtained. This study takes the blurred fundus image as the conditional information to better preserve the fine-grained structure of the image. Collecting blurred-clear fundus image pairs is difficult. Thus, synthetic fundus dataset is widely used for training. Therefore, a Gaussian filtering algorithm is designed to simulate the defocus blur images. In the training process, the conditional information and the noisy image are first spliced and fed into the network, and the abstract features of the image are extracted by continuously reducing the image size through downsampling. This procedure can significantly reduce the time and space complexity of the sparse representation calculation. Then, the convolutional network is used to implement convolutional dictionary learning and obtain the sparse representation of the image. Given that the self-attention mechanism can capture non-local similarity and long-range dependency, this study adds self-attention to the convolutional dictionary learning module to improve the reconstruction quality. Finally, hierarchical feature extraction is achieved by feature concatenation to realize information fusion between different levels and better use local features in the image. The downsampled feature is recovered to the original image size by an inverse convolutional layer. The model minimizes the negative log-likelihood loss, which represents the difference in probability distribution between the generated image and the original image. After the model is trained, a clear fundus image is generated by gradually removing the noise from a noisy picture with a blurred image as conditional input.
Result
2
The proposed method was evaluated on EyePACS dataset, and multiple experiments were performed on synthetic datasets DRIVE (digital retinal images for vessel extraction), CHASEDB1 (child heart and health study in England), ROC (retinopathy online challenge), realistic datasets RF (real fundus) and HRF (high-resolution fundus) to demonstrate the generalizability of our model. Experimental results show that the evaluation metrics peak signal-to-noise ratio (PSNR) and learned perceptual image patch similarity (LPIPS) are improved on average by 1.992 9 and 0.028 9, respectively, compared with the original diffusion model (learning enhancement from degradation (Led)). Moreover, the proposed approach was used as a preprocessing module for downstream tasks. The experiment on retinal vessel segmentation is adopted to prove that our approach can benefit the downstream tasks in clinical application. The results of segmentation experiments on the DRIVE dataset show that all the segmentation metrics improve compared with the original diffusion model. Specifically, the area under the curve (AUC), accuracy (Acc), and sensitivity (Sen) are improved by 0.031 4, 0.003 0, and 0.073 8 on average, respectively.
Conclusion
2
The proposed method provides a practical tool for fundus image deblurring and a new perspective to improve the quality and accuracy of diagnostic. This approach has a positive impact on patients and ophthalmologists and is expected to promote further development in the interdisciplinary research of ophthalmology and computer science.
眼底图像增强卷积字典学习稀疏表示扩散模型条件扩散模型
fundus image enhancementconvolutional dictionary learningsparse representationdiffusion modelconditional diffusion model
Alimanov A and Islam M B. 2022. Retinal image restoration using Transformer and cycle-consistent generative adversarial network//2022 International Symposium on Intelligent Signal Processing and Communication Systems. Penang, Malaysia: IEEE: 1-4 [DOI: 10.1109/ispacs57703.2022.10082822http://dx.doi.org/10.1109/ispacs57703.2022.10082822]
Chen N X, Zhang Y, Zen H G, Weiss R J, Norouzi M and Chen W. 2020. WaveGrad: estimating gradients for waveform generation [EB/OL]. [2023-08-17]. https://arxiv.org/pdf/2009.00713.pdfhttps://arxiv.org/pdf/2009.00713.pdf
Cheng P J, Lin L, Huang Y J, He H Q, Luo W H and Tang X Y. 2023. Learning enhancement from degradation: a diffusion model for fundus image enhancement [EB/OL]. [2023-08-17]. https://arxiv.org/pdf/2303.04603.pdfhttps://arxiv.org/pdf/2303.04603.pdf
Cuadros J and Bresnick G. 2009. EyePACS: an adaptable telemedicine system for diabetic retinopathy screening. Journal of Diabetes Science and Technology, 3(3): 509-516 [DOI: 10.1177/193229680900300315http://dx.doi.org/10.1177/193229680900300315]
Deng Z, Cai Y F, Chen L, Gong Z, Bao Q Q, Yao X, Fang D, Yang W M, Zhang S C and Ma L. 2022. RFormer: Transformer-based generative adversarial network for real fundus image restoration on a new clinical benchmark. IEEE Journal of Biomedical and Health Informatics, 26(9): 4645-4655 [DOI: 10.1109/jbhi.2022.3187103http://dx.doi.org/10.1109/jbhi.2022.3187103]
Dhariwal P and Nichol A. 2021. Diffusion models beat GANs on image synthesis [EB/OL]. [2023-08-17]. https://arxiv.org/pdf/2105.05233.pdfhttps://arxiv.org/pdf/2105.05233.pdf
Fraz M M, Remagnino P, Hoppe A, Uyyanonvara B, Rudnicka A R, Owen C G and Barman S A. 2012. An ensemble classification-based approach applied to retinal blood vessel segmentation. IEEE Transactions on Biomedical Engineering, 59(9): 2538-2548 [DOI: 10.1109/tbme.2012.2205687http://dx.doi.org/10.1109/tbme.2012.2205687]
Gaudio A, Smailagic A and Campilho A. 2020. Enhancement of retinal fundus images via pixel color amplification//Proceedings of the 17th International Conference on Image Analysis and Recognition. Póvoa de Varzim, Portugal: Springer: 299-312 [DOI: 10.1007/978-3-030-50516-5_26http://dx.doi.org/10.1007/978-3-030-50516-5_26]
Gregor K and LeCun Y. 2010. Learning fast approximations of sparse coding//Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel: Omnipress: 399-406
Ho J, Jain A and Abbeel P. 2020. Denoising diffusion probabilistic models//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 6840-6851
Jiao L J, Wang W J, Zhao Q S and Cao J F. 2017. Image denoising based on sparse representation of neighbor local OMP. Jouranl of Image and Graphics, 22(11): 1486-1492
焦丽娟, 王文剑, 赵青杉, 曹建芳. 2017. 近邻局部OMP稀疏表示图像去噪. 中国图象图形学报, 22(11): 1486-1492 [DOI: 10.11834/jig.170105http://dx.doi.org/10.11834/jig.170105]
Köhler T, Budai A, Kraus M F, Odstrcilik J, Michelson G and Hornegger J. 2013. Automatic no-reference quality assessment for retinal fundus images using vessel segmentation//The 26th IEEE International Symposium on Computer-Based Medical Systems. Porto, Portugal: IEEE: 95-100 [DOI: 10.1109/cbms.2013.6627771http://dx.doi.org/10.1109/cbms.2013.6627771]
Li H, Liu H F, Hu Y, Fu H Z, Zhao Y T, Miao H P and Liu J. 2022. An annotation-free restoration network for cataractous fundus images. IEEE Transactions on Medical Imaging, 41(7): 1699-1710 [DOI: 10.1109/tmi.2022.3147854http://dx.doi.org/10.1109/tmi.2022.3147854]
Liang H and Li Q. 2016. Hyperspectral imagery classification using sparse representations of convolutional neural network features. Remote Sensing, 8(2): #99 [DOI:10.3390/rs8020099http://dx.doi.org/10.3390/rs8020099]
Liu W T, Yang H H, Tian T, Cao Z W, Pan X P, Xu W J, Jin Y and Gao F. 2022. Full-resolution network and dual-threshold iteration for retinal vessel and coronary angiograph segmentation. IEEE Journal of Biomedical and Health Informatics, 26(9): 4623-4634 [DOI: 10.1109/JBHI.2022.3188710http://dx.doi.org/10.1109/JBHI.2022.3188710]
MacGillivray T J, Cameron J R, Zhang Q L, El-Medany A, Mulholland C, Sheng Z Y, Dhillon B, Doubal F N, Foster P J, Trucco E, Sudlow C and UK Biobank Eye and Vision Consortium. 2015. Suitability of UK biobank retinal images for automatic analysis of morphometric properties of the vasculature. PLoS ONE, 10(5): #e0127914 [DOI: 10.1371/journal.pone.0127914http://dx.doi.org/10.1371/journal.pone.0127914]
Nichol A and Dhariwal P. 2021. Improved denoising diffusion probabilistic models [EB/OL]. [2023-08-17]. https://arxiv.org/pdf/2102.09672.pdfhttps://arxiv.org/pdf/2102.09672.pdf
Niemeijer M, Van Ginneken B, Cree M J, Mizutani A, Quellec G, Sanchez C I, Zhang B, Hornero R, Lamard M, Muramatsu C, Wu X Q, Cazuguel G, You J, Mayo A, Li Q, Hatanaka Y, Cochener B, Roux C, Karray F, Garcia M, Fujita H and Abramoff M D. 2010. Retinopathy online challenge: automatic detection of microaneurysms in digital color fundus photographs. IEEE Transactions on Medical Imaging, 29(1): 185-195 [DOI: 10.1109/tmi.2009.2033909http://dx.doi.org/10.1109/tmi.2009.2033909]
Rombach R, Blattmann A, Lorenz D, Esser P and Ommer B. 2022. High-resolution image synthesis with latent diffusion models//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 10674-10685 [DOI: 10.1109/cvpr52688.2022.01042http://dx.doi.org/10.1109/cvpr52688.2022.01042]
Saharia C, Ho J, Chan W, William, Salimans T, Fleet D J and Norouzi M. 2022. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4): 4713-4726 [DOI:10.1109/TPAMI.2022.3204461http://dx.doi.org/10.1109/TPAMI.2022.3204461]
Schmidt-Erfurth U, Sadeghipour A, Gerendas B S, Waldstein S M and Bogunović H. 2018. Artificial intelligence in retina. Progress in Retinal and Eye Research, 67: 1-29 [DOI: 10.1016/j.preteyeres.2018.07.004http://dx.doi.org/10.1016/j.preteyeres.2018.07.004]
Setiawan A W, Mengko T R, Santoso O S and Suksmono A B. 2013. Color retinal image enhancement using CLAHE//Proceedings of 2013 IEEE International Conference on ICT for Smart Society. Jakarta, Indonesia: IEEE: 1-3 [DOI: 10.1109/ictss.2013.6588092http://dx.doi.org/10.1109/ictss.2013.6588092]
Shen Z Y, Fu H Z, Shen J B and Shao L. 2021. Modeling and enhancing low-quality retinal fundus images. IEEE Transactions on Medical Imaging, 40(3): 996-1006 [DOI: 10.1109/tmi.2020.3043495http://dx.doi.org/10.1109/tmi.2020.3043495]
Sohl-Dickstein J, Weiss E A, Maheswaranathan N and Ganguli S. 2015. Deep unsupervised learning using nonequilibrium thermodynamic [EB/OL]. [2023-08-17]. https://arxiv.org/pdf/1503.03585.pdfhttps://arxiv.org/pdf/1503.03585.pdf
Staal J, Abràmoff M D, Niemeijer M, Viergever M A and Van G B. 2004. Ridge-based vessel segmentation in color images of the retina. IEEE Transactions on Medical Imaging, 23(4): 501-509 [DOI: 10.1109/tmi.2004.825627http://dx.doi.org/10.1109/tmi.2004.825627]
Swapna T R, Indu D and Chakraborty C. 2015. Macular region enhancement of fundus fluorescein angiogram images using super resolution via sparse representation and quality analysis. Procedia Computer Science, 58: 586-592 [DOI: 10.1016/j.procs.2015.08.077http://dx.doi.org/10.1016/j.procs.2015.08.077]
Wan C, Zhou X T, You Q J, Sun J, Shen J X, Zhu S J, Jiang Q and Yang W H. 2022. Retinal image enhancement using cycle-constraint adversarial network. Frontiers in Medicine, 8: #793726 [DOI: 10.3389/fmed.2021.793726http://dx.doi.org/10.3389/fmed.2021.793726]
Wang L F, Dou J L, Qin P L, Lin S Z, Gao Y and Zhang C C. 2019. Medical image fusion using double dictionary learning and adaptive PCNN. Journal of Image and Graphics, 24(9): 1588-1603
王丽芳, 窦杰亮, 秦品乐, 蔺素珍, 高媛, 张程程. 2019. 双重字典学习与自适应PCNN相结合的医学图像融合. 中国图象图形学报, 24(9): 1588-1603 [DOI: 10.11834/jig.180667http://dx.doi.org/10.11834/jig.180667]
Wang W L, Bao J M, Zhou W G, Chen D D, Chen D, Yuan L and Li H Q. 2022. SinDiffusion: learning a diffusion model from a single natural image [EB/OL]. [2023-08-17]. https://arxiv.org/pdf/2211.12445.pdfhttps://arxiv.org/pdf/2211.12445.pdf
Zagoruyko S and Komodakis N. 2017. Wide residual networks [EB/OL]. [2023-08-17]. https://arxiv.org/pdf/1605.07146.pdfhttps://arxiv.org/pdf/1605.07146.pdf
Zamir S W, Arora A, Khan S, Hayat M, Khan F S, Yang M H and Shao L. 2020. Learning enriched features for real image restoration and enhancement//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 492-511 [DOI: 10.1007/978-3-030-58595-2_30http://dx.doi.org/10.1007/978-3-030-58595-2_30]
Zhang R, Isola P, Efros A A, Shechtman E and Wang O. 2018. The unreasonable effectiveness of deep features as a perceptual metric//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 586-595 [DOI: 10.1109/cvpr.2018.00068http://dx.doi.org/10.1109/cvpr.2018.00068]
Zhu W H, Qiu P J, Farazi M, Nandakumar K, Dumitrascu O M and Wang Y L. 2023. Optimal transport guided unsupervised learning for enhancing low-quality retinal images [EB/OL]. [2023-08-17]. https://arxiv.org/pdf/2302.02991.pdfhttps://arxiv.org/pdf/2302.02991.pdf
相关作者
相关机构