多重注意力和级联上下文糖网病病灶分割
MCFNet: multi-attention and cascaded context fusion network for segmentation multiple lesion of diabetic retinopathy images
- 2024年29卷第12期 页码:3800-3816
纸质出版日期: 2024-12-16
DOI: 10.11834/jig.230827
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-12-16 ,
移动端阅览
郭燕飞, 杜杭丽, 杨成龙, 孔祥真. 2024. 多重注意力和级联上下文糖网病病灶分割. 中国图象图形学报, 29(12):3800-3816
Guo Yanfei, Du Hangli, Yang Chenglong, Kong Xiangzhen. 2024. MCFNet: multi-attention and cascaded context fusion network for segmentation multiple lesion of diabetic retinopathy images. Journal of Image and Graphics, 29(12):3800-3816
目的
2
糖尿病视网膜病变(糖网病)(diabetic retinopathy, DR)是人类致盲的首要杀手,自动准确的糖网病病灶分割对于糖网病分级和诊疗至关重要。然而,不同类型的糖网病病灶结构复杂,大小尺度不一致且存在类间相似性和类内差异性,导致同时准确分割多种病灶充满挑战。针对上述问题,提出一种基于多重注意力和级联上下文融合的糖网病多类型病灶分割方法。
方法
2
首先,三重注意力模块提取病灶的通道注意力、空间注意力和像素点注意力特征并进行加法融合以保证病灶特征的一致性。另外,级联上下文特征融合模块采用自适应平均池化和非局部操作提取不同层网络的全局上下文信息以扩大病灶的感受野。最后,平衡注意力模块计算病灶前景、背景和边界注意力图,并利用挤压激励模块在特征通道之间加权以重新平衡3个区域的注意力,令网络更多关注病灶的边缘细节,实现精细化分割。
结果
2
在国际公开的糖网病图像数据集DDR(dataset for diabetic retinopathy)、IDRiD(Indian diabetic retinopathy image dataset)和E-Ophtha进行充分的对比实验和消融实验,4种病灶分割的平均AUC(area under curve)分别达到0.679 0、0.750 3和0.660 1。
结论
2
基于多重注意力和级联上下文融合的糖网病分割方法(multi-attention and cascaded context fusion network,MCFNet)能够克服其他眼底组织和病灶噪声的不良干扰,同时实现糖网病4种病灶的精准分割,具有较好的准确性和鲁棒性,为临床医生进行糖网病诊疗提供有力支持。
Objective
2
Diabetic retinopathy (DR) is a leading cause of blindness in humans, and regular screening is helpful for its early detection and containment. While automated and accurate lesion segmentation is crucial for DR grading and diagnosis, this approach encounters many challenges due to the complex structures, inconsistent scales, and blurry edges of different kinds of lesions. However, the manual segmentation of DR lesions is time-consuming and labor-intensive, thus making the large-scale popularization of the approach particularly difficult due to the limited doctor resources and the high cost of manual annotation. Therefore, an automatic DR lesion segmentation method should be developed to reduce clinical workload and increase efficiency. Recently, convolutional neural networks have been widely applied in the fields of medical image segmentation and disease classification. The existing deep-learning-based methods for DR lesion segmentation are classified into image-based and patch-based approaches. Some studies have adopted the attention mechanism to segment lesions using the whole fundus image as input. However, these methods may lose the edge details of lesions, thus introducing challenges in obtaining fine-grained lesion segmentation results. Other studies have cropped the original images to patches and inputted them into the encoder-decoder networks for DR lesion segmentation. However, most of the approaches proposed in the literature utilize fixed weights to fuse coding features at different levels while ignoring the information differences among them, thus hindering the effective integration of multi-level features for accurate lesion segmentation. To address these issues, this paper proposes a multi-attention and cascaded context fusion network (MCFNet) for the simultaneous segmentation of multiple lesions.
Method
2
The proposed network adopts an encoder-decoder framework, including the VGG16 backbone network, triple attention module (TAM), cascaded context fusion module (CFM), and balanced attention module (BAM). First, directly fusing multi-level features from different stages of the encoder easily results in inconsistent feature scales and information redundancy. Dynamically selecting important information from multi-resolution feature maps not only preserves contextual information in low-resolution feature maps but also effectively reduces background noise interference in high-resolution feature maps. TAM is proposed to extract three types of attention features, i.e., channel attention, spatial attention, and pixel-point attention. Second, the channel attention assigns different weights to different feature channels to enable the selection of specific feature patterns for lesion segmentation. The spatial attention also highlights the location information of lesions in the feature map, thus making the proposed model pay attention to lesion areas. Lastly, the pixel-point attention mechanism extracts small-scale lesion features. TAM ensures feature consistency and selectivity by learning and fusing these attention features. In addition, traditional receptive field ranges can hinder the capture of subtle features due to the small size of lesions. To address this problem, CFM is proposed to capture global context information at different levels and to perform summation with local context information from TAM. The module is designed to expand the scope of multi-scale receptive fields and consequently improve the accuracy and robustness of small-scale lesion segmentation. This study also uses BAM to address the rough and inconspicuous lesion edges. This module calculates the foreground, background, and boundary attention map to reduce the adverse interference of the background noise and to clarify the lesion contour.
Result
2
The lesion segmentation performance of the proposed method was compared with that of extant methods on the IDRiD, DDR, and E-Ophtha datasets. Experimental results show that despite the variations in the number and appearances of retinal images from different countries and ethnicities, the proposed model outperforms the state-of-the-art in terms of accuracy and robustness. Specifically, on the IDRiD dataset, MCFNet achieves AUC values of 0.917 1, 0.719 7, 0.655 7, and 0.708 7 for lesion segmentation in the EX, HE, MA, and SE, respectively. The mAUC, mIOU, and mDice of four kinds of lesions on the IDRiD dataset reach 0.750 3, 0.638 7, and 0.700 3, respectively. On the DDR dataset, the proposed model achieves mAUC, mIOU, and mDice values of 0.679 0, 0.434 7, and 0.598 9 for these lesions. Compared with PSPNet, the proposed method obtains 52.7%, 18.63%, and 33.06% higher mAUC, mIOU, and mDice values, respectively. On the E-Ophtha dataset, the proposed MCFNet achieves mAUC, mIOU, and mDice values of 0.660 1, 0.449 5, and 0.628 5, respectively. When compared with MLSF-Net, these values improve by 15.11%, 4.06%, and 20.68%, respectively. The segmentation performance of the proposed model was also compared with that of other methods. Compared with these methods, the segmentation results of the proposed model are closer to the ground truth, and the obtained edges are finer and more accurate. To verify the effectiveness of the proposed TAM, CFM, and BAM, comprehensive ablation experiments were conducted on the IDRiD, DDR, and E-Ophtha datasets. The proposed model obtained mAUC, mIOU, and mDice values of 0.597 5, 0.451 2, and 0.584 8 on the IDRiD dataset when using only the baseline. The fusion of VGG16 with TAM, CFM, and BAM achieved the best segmentation results for all four types of multi-scale lesions, thereby suggesting that the proposed modules contribute to improving the multiple lesion segmentation performance in various degrees.
Conclusion
2
This paper proposes a multi-attention and cascaded context fusion network for the multiple lesion segmentation of diabetic retinopathy images. The proposed MCFNet introduces TAM to learn and fuse channel attention, spatial attention, and pixel-point attention features to ensure feature consistency and selectivity. CFM utilizes adaptive average pooling and non-local operation to capture local and global contextual features for concatenation fusion and to expand the receptive field of fundus lesions. BAM calculates attention maps for the foreground, background, and lesion contours and uses the squeeze-and-excitation modules to rebalance the attention features of these regions, preserve the edge details, and reduce interference from background noise. Experimental results on the IDRiD, DDR, and E-Ophtha datasets demonstrate the superiority of the proposed method compared with the state-of-the-art. This method also effectively overcomes the interference of background and other lesion noises, thus achieving an accurate segmentation of different types of multi-scale lesions.
糖尿病视网膜病变(DR)多病灶分割三重注意力级联上下文融合平衡注意力
diabetic retinopathy(DR)multi-lesion segmentationtriple attentioncascaded context fusionbalanced attention
Chollet F. 2017. Xception: deep learning with depthwise separable convolutions//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1800-1807 [DOI: 10.1109/CVPR.2017.195http://dx.doi.org/10.1109/CVPR.2017.195]
Decencière E, Cazuguel G, Zhang X W, Thibault G, Klein J C, Meyer F, Marcotegui B, Quellec G, Lamard M, Danno R, Elie D, Massin P, Viktor Z, Erginay A, Laÿ B and Chabouis A. 2013. TeleOphta: machine learning and image processing methods for teleophthalmology. IRBM, 34(2): 196-203 [DOI: 10.1016/j.irbm.2013.01.010http://dx.doi.org/10.1016/j.irbm.2013.01.010]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2021. An image is worth16 × 16 words: Transformers for image recognition at scale//Proceedings of 2021 International Conference on Learning Representations. Vienna, Austria: OpenReview.net: 11929
Feng S L, Zhao H M, Shi F, Cheng X N, Wang M, Ma Y H, Xiang D H, Zhu W F and Chen X J. 2020. CPFNet: context pyramid fusion network for medical image segmentation. IEEE Transactions on Medical Imaging, 39(10): 3008-3018 [DOI: 10.1109/TMI.2020.2983721http://dx.doi.org/10.1109/TMI.2020.2983721]
Foo A, Hsu W, Lee M L, Lim G and Wong T Y. 2020. Multi-task learning for diabetic retinopathy grading and lesion segmentation//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI Press: 13267-13272 [DOI: 10.1609/aaai.v34i08.7035http://dx.doi.org/10.1609/aaai.v34i08.7035]
Gu T F, Hao P Y, Bai C and Liu N. 2021. Diabetic retinopathy grading based on multi-channel attention. Journal of Image and Graphics, 26(7): 1726-1736
顾婷菲, 郝鹏翼, 白琮, 柳宁. 2021. 结合多通道注意力的糖尿病性视网膜病变分级. 中国图象图形学报, 26(7): 1726-1736 [DOI: 10.11834/jig.200518http://dx.doi.org/10.11834/jig.200518]
Gu Z W, Cheng J, Fu H Z, Zhou K, Hao H Y, Zhao Y T, Zhang T Y, Gao S H and Liu J. 2019. CE-Net: context encoder network for 2D medical image segmentation. IEEE Transactions on Medical Imaging, 38(10): 2281-2292 [DOI: 10.1109/TMI.2019.2903562http://dx.doi.org/10.1109/TMI.2019.2903562]
Guo S, Li T, Kang H, Li N, Zhang Y J and Wang K. 2019. L-Seg: an end-to-end unified framework for multi-lesion segmentation of fundus images. Neurocomputing, 349: 52-63 [DOI: 10.1016/j.neucom.2019.04.019http://dx.doi.org/10.1016/j.neucom.2019.04.019]
He A L, Wang K, Li T, Bo W, Kang H and Fu H Z. 2022. Progressive multiscale consistent network for multiclass fundus lesion segmentation. IEEE Transactions on Medical Imaging, 41(11): 3146-3157 [DOI: 10.1109/TMI.2022.3177803http://dx.doi.org/10.1109/TMI.2022.3177803]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141 [DOI: 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745]
Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2261-2269 [DOI: 10.1109/CVPR.2017.243http://dx.doi.org/10.1109/CVPR.2017.243]
Huang S Q, Li J N, Xiao Y Z, Shen N and Xu T F. 2022. RTNet: relation transformer network for diabetic retinopathy multi-lesion segmentation. IEEE Transactions on Medical Imaging, 41(6): 1596-1607 [DOI: 10.1109/TMI.2022.3143833http://dx.doi.org/10.1109/TMI.2022.3143833]
Kingma D P and Ba J. 2015. Adam: a method for stochastic optimization//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR: 1-15
Li T, Gao Y Q, Wang K, Guo S, Liu H R and Kang H. 2019. Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Information Sciences, 501: 511-522 [DOI: 10.1016/j.ins.2019.06.011http://dx.doi.org/10.1016/j.ins.2019.06.011]
Milletari F, Navab N and Ahmadi S A. 2016. V-Net: fully convolutional neural networks for volumetric medical image segmentation//Proceedings of the 4th International Conference on 3D Vision (3DV). Stanford, USA: IEEE: 565-571 [DOI: 10.1109/3DV.2016.79http://dx.doi.org/10.1109/3DV.2016.79]
Nguyen T C, Nguyen T P, Diep G H, Tran-Dinh A H, Nguyen T V and Tran M T. 2021. CCBANet: cascading context and balancing attention for polyp segmentation//Proceedings of the 24th International Conference on Medical Image Computing and Computer-Assisted Intervention. Strasbourg, France: Springer: 633-643 [DOI: 10.1007/978-3-030-87193-2_60http://dx.doi.org/10.1007/978-3-030-87193-2_60]
Pisano E D, Zong S Q, Hemminger B M, DeLuca M, Johnston R E, Muller K, Braeuning M P and Pizer S M. 1998. Contrast limited adaptive histogram equalization image processing to improve the detection of simulated spiculations in dense mammograms. Journal of Digital Imaging, 11(4): 193-200 [DOI: 10.1007/BF03178082http://dx.doi.org/10.1007/BF03178082]
Porwal P, Pachade S, Kamble R, Kokare M, Deshmukh G, Sahasrabuddhe V and Meriaudeau F. 2018. Indian diabetic retinopathy image dataset (IDRiD): a database for diabetic retinopathy screening research. Data, 3(3): #325 [DOI: 10.3390/data3030025http://dx.doi.org/10.3390/data3030025]
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S A, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C and Li F F. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3): 211-252 [DOI: 10.1007/s11263-015-0816-yhttp://dx.doi.org/10.1007/s11263-015-0816-y]
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: Ithaca: ICLR
Song R X, Cao P and Zhao D Z. 2022. Domain-adaptive-learning based diabetic retinopathy grading diagnosis. Journal of Image and Graphics, 27(11): 3356-3370
宋若仙, 曹鹏, 赵大哲. 2022. 结合域适应学习的糖尿病视网膜病变分级诊断. 中国图象图形学报, 27(11): 3356-3370 [DOI: 10.11834/jig.210411http://dx.doi.org/10.11834/jig.210411]
Tan T E and Wong T Y. 2023. Diabetic retinopathy: looking forward to 2030. Frontiers in Endocrinology, 13: #1077669 [DOI: 10.3389/fendo.2022.1077669http://dx.doi.org/10.3389/fendo.2022.1077669]
Tang L, Xu G T and Zhang J F. 2023. Inflammation in diabetic retinopathy: possible roles in pathogenesis and potential implications for therapy. Neural Regeneration Research, 18(5): #976 [DOI: 10.4103/1673-5374.355743http://dx.doi.org/10.4103/1673-5374.355743]
Thomas R L, Halim S, Gurudas S, Sivaprasad S and Owens D R. 2019. IDF diabetes atlas: a review of studies utilising retinal photography on the global prevalence of diabetes related retinopathy between 2015 and 2018. Diabetes Research and Clinical Practice, 157: #107840 [DOI: 10.1016/j.diabres.2019.107840http://dx.doi.org/10.1016/j.diabres.2019.107840]
Wang L W, Liu Z S, Siu W C and Lun D P K. 2020. Lightening network for low-light image enhancement. IEEE Transactions on Image Processing, 29: 7984-7996 [DOI: 10.1109/TIP.2020.3008396http://dx.doi.org/10.1109/TIP.2020.3008396]
Wang S Q, Wang X Y, Hu Y, Shen Y Y, Yang Z L, Gan M and Lei B Y. 2021a. Diabetic retinopathy diagnosis using multichannel generative adversarial network with semisupervision. IEEE Transactions on Automation Science and Engineering, 18(2): 574-585 [DOI: 10.1109/TASE.2020.2981637http://dx.doi.org/10.1109/TASE.2020.2981637]
Wang X F, Xu M, Zhang J C, Jiang L and Li L. 2021b. Deep Multi-Task learning for diabetic retinopathy grading in fundus images//Proceedings of the 35th AAAI Conference on Artificial Intelligence. Vancouver, Canada: AAAI: 2826-2834 [DOI: 10.1609/aaai.v35i4.16388http://dx.doi.org/10.1609/aaai.v35i4.16388]
Wang X L, Girshick R, Gupta A and He K M. 2018. Non-local neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7794-7803 [DOI: 10.1109/CVPR.2018.00813http://dx.doi.org/10.1109/CVPR.2018.00813]
Wang X Y, Fang Y Q, Yang S, Zhu D L, Wang M H, Zhang J, Zhang J, Cheng J, Tong K Y and Han X. 2023. CLC-Net: contextual and local collaborative network for lesion segmentation in diabetic retinopathy images. Neurocomputing, 527: 100-109 [DOI: 10.1016/j.neucom.2023.01.013http://dx.doi.org/10.1016/j.neucom.2023.01.013]
Xie S N, Girshick R, Dollár P, Tu Z W and He K M. 2017. Aggregated residual transformations for deep neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5987-5995 [DOI: 10.1109/CVPR.2017.634http://dx.doi.org/10.1109/CVPR.2017.634]
Xie S N and Tu Z W. 2015. Holistically-nested edge detection//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1395-1403 [DOI: 10.1109/ICCV.2015.164http://dx.doi.org/10.1109/ICCV.2015.164]
Xu Y F, Zhou Z M, Li X, Zhang N, Zhang M Z and Wei P P. 2021. FFU-Net: feature fusion U-Net for lesion segmentation of diabetic retinopathy. BioMed Research International, 2021: #6644071 [DOI: 10.1155/2021/6644071http://dx.doi.org/10.1155/2021/6644071]
Xue J, Yan S, Qu J H, Qi F, Qiu C G, Zhang H Y, Chen M R, Liu T T, Li D W and Liu X Y. 2019. Deep membrane systems for multitask segmentation in diabetic retinopathy. Knowledge-Based Systems, 183: #104887 [DOI: 10.1016/j.knosys.2019.104887http://dx.doi.org/10.1016/j.knosys.2019.104887]
Yan H T, Xie J X, Zhu D L, Jia L K and Guo S J. 2022. MSLF-Net: a multi-scale and multi-level feature fusion net for diabetic retinopathy segmentation. Diagnostics, 12(12): #2918 [DOI: 10.3390/diagnostics12122918http://dx.doi.org/10.3390/diagnostics12122918]
Zhang Y D, Liu H Y and Hu Q. 2021. TransFuse: fusing transformers and CNNs for medical image segmentation//Proceedings of the 24th International Conference on Medical Image Computing and Computer-Assisted Intervention. Strasbourg, France: Springer: 14-24 [DOI: 10.1007/978-3-030-87193-2_2http://dx.doi.org/10.1007/978-3-030-87193-2_2]
Zhang Z Z, Liu M and Zhu D J. 2020. Automatic recognition and classification of diabetic retinopathy images by combining an attention mechanism and an efficient network. Journal of Image and Graphics, 25(8): 1708-1718
张子振, 刘明, 朱德江. 2020. 融合注意力机制和高效网络的糖尿病视网膜病变识别与分类. 中国图象图形学报, 25(8): 1708-1718 [DOI: 10.11834/jig.190644http://dx.doi.org/10.11834/jig.190644]
Zhao H S, Shi J P, Qi X J, Wang X G and Jia J Y. 2017. Pyramid scene parsing network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6230-6239 [DOI: 10.1109/CVPR.2017.660http://dx.doi.org/10.1109/CVPR.2017.660]
Zhou Y, He X D, Huang L, Liu L, Zhu F, Cui S S and Shao L. 2019. Collaborative learning of semi-supervised segmentation and classification for medical images//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2074-2083 [DOI: 10.1109/CVPR.2019.00218http://dx.doi.org/10.1109/CVPR.2019.00218]
相关文章
相关作者
相关机构