TransAS-UNet:融合Swin Transformer和UNet的乳腺癌区域分割

徐旺旺; 许良凤; 李博凯; 周曦; 律娜; 詹曙

doi:10.11834/jig.230130

图像分析和识别 | 浏览量 : 0 下载量: 7 CSCD: 0

PDF
导出
分享
收藏
专辑

TransAS-UNet:融合Swin Transformer和UNet的乳腺癌区域分割
TransAS-UNet： regional segmentation of breast cancer Swin Transformer and of UNet algorithm
2024年29卷第3期页码：741-754
纸质出版日期： 2024-03-16 ，
DOI： 10.11834/jig.230130
稿件说明：

移动端阅览

徐旺旺，许良凤，李博凯，周曦，律娜，詹曙. 2024. TransAS-UNet:融合Swin Transformer和UNet的乳腺癌区域分割. 中国图象图形学报， 29(03):0741-0754

Xu Wangwang， Xu Liangfeng， Li Bokai， Zhou Xi， Lyu Na， Zhan Shu. 2024. TransAS-UNet： regional segmentation of breast cancer Swin Transformer and of UNet algorithm. Journal of Image and Graphics， 29(03):0741-0754
徐旺旺，许良凤，李博凯，周曦，律娜，詹曙. 2024. TransAS-UNet:融合Swin Transformer和UNet的乳腺癌区域分割. 中国图象图形学报， 29(03):0741-0754 DOI： 10.11834/jig.230130.

Xu Wangwang， Xu Liangfeng， Li Bokai， Zhou Xi， Lyu Na， Zhan Shu. 2024. TransAS-UNet： regional segmentation of breast cancer Swin Transformer and of UNet algorithm. Journal of Image and Graphics， 29(03):0741-0754 DOI： 10.11834/jig.230130.

摘要

目的

乳腺癌在女性中是致病严重且发病率较高的疾病，早期乳腺癌症检测是全世界需要解决的重要难题。如今乳腺癌的诊断方法有临床检查、影像学检查和组织病理学检查。在影像学检查中常用的方式是X光、CT（computed tomography）、磁共振等，其中乳房X光片已用于检测早期癌症，然而从本地乳房X线照片中手动分割肿块是一项非常耗时且容易出错的任务。因此，需要一个集成的计算机辅助诊断（computer aided diagnosis，CAD）系统来帮助放射科医生进行自动和精确的乳房肿块识别。

方法

基于深度学习图像分割框架，对比了不同图像分割模型，同时在UNet结构上采用了Swin架构来代替分割任务中的下采样和上采样过程，实现局部和全局特征的交互。利用Transformer来获取更多的全局信息和不同层次特征来取代短连接，实现多尺度特征融合，从而精准分割。在分割模型阶段也采用了Multi-Attention ResNet分类网络对癌症区域的等级识别，更好地对乳腺癌进行诊断医疗。

结果

本文模型在乳腺癌X光数据集INbreast上实现肿块的准确分割，IoU（intersection over union）值达到95.58%，Dice系数为93.45%，与其他的分割模型相比提高了4%～6%，将得到的二值化分割图像进行四分类，Accuracy值达到95.24%。

结论

本文提出的TransAS-UNet图像分割方法具有良好的性能和临床意义，该方法优于对比的二维图像医学分割方法。

Abstract

Objective

Breast cancer is a serious and high-morbidity disease in women. Early detection of breast cancer is an important problem that needs to be solved all over the world. The current diagnostic methods for breast cancer include clinical， imaging， and histopathological examinations. The commonly used methods in imaging examination are X-ray， computed tomography（CT）， and magnetic resonance imaging. etc.， among which mammograms have been used in early cancer to detect； however， manually segmenting the mass from the local mammogram is an very time-consuming and error-prone task. Therefore， an integrated computer aided diagnosis（CAD） is needed to help radiologists perform automatic and precise breast mass identification.

Method

In this work， we compared different image segmentation models based on the deep learning image segmentation framework. At the same time， on the based UNet structure， we adopt the Swin architecture to replace the downsampling and upsampling processes in the segmentation task ，to realize the interaction between local and global features. At the same time we use a Transformer to obtain more global information and different hierarchical features to replace short connections and realize multi-scale feature fusion to achieve accurate segmentation. In the segmentation model stage， we also use so as Multi-Attention ResNet classification network to identify the classification of cancer regions Better diagnosis and treatment of breast cancer. During segmentation the Swin Transformer and atrous spatial pyramid pooling （ASPP） modules are used to replace the common convolution layer through analogy with the UNet structure model. The shift window and multiple attention are used to achieve the integration of feature information inside the image slice and extract information complementarity between non-adjacent areas. At the same time， the ASPP structure can achieve self-attention of local information with an increasing receptive field. A Transformer structure is introduced to correlate information between different layers to prevent the loss of shallow layers of important information during downsampling convolution. The final architecture not only inherits advantages Transformer’s in learning global semantic associations， but also uses different levels of characteristics to preserve more semantics and more details in the model. As the input dataset of classification networks， binarized images obtained by the segmentation model can be used to identify different categories of breast cancer tumors. Based on ResNet50， this classification model adds multi-type attention modules and overfitting operations. squeeze-and-excitation（SE） and selective kernel（SK） attention can optimize network parameters， so that it only pays attention to the differences in segmentation regions improving the efficiency of the model. Thus proposed model by us achieved accurate segmentation of the lump on the breast cancer X-ray dataset INbreast， and we also compared it with five segmentation structures： UNet， UNet++， Res18_UNet， MultiRes_UNet， and Dense_UNet. After the segmentation model， a more accurate binary map of the cancer region was obtained. Problems， such as feature information blending of different levels and self-concern of the local information of the convolutional layer， exist in up-sampling and downsampling based on the UNet structure. Therefore， the Swin Transformer structure， which has a sliding window operation and hierarchical design， is adopted. Window Attention is shifted mainly by the Window Attention module and the Shifted window attention module， which enables the input feature graph to be sliced into multiple windows. The weight of each window is shifted in accordance with the shifted self-attention， and the position of the entire feature graph is shifted. It can realize the information interaction within the same feature graph. In upsampling and downsampling， we use four Swin Transformer structures. and in the process of fusion， we use the pyramid ASPP structure to replace the common feature graph channel addition operation， which can use multiple convolution check feature graphs and channel fusions， and the given input can be sampled in parallel with cavity convolution at different sampling rates. Achieve multiple scale capture image context information is obtained. In order to better integrate high- and low-dimensional spatial information， we propose a new multi-scale feature graph fusion strategy and use a Transformer with skip connections to enhance spatial domain information representation. Each cancer image was classified into normal， mass， deformation， and calcification according the introduction of the INbreast dataset. Each category was labeled and then sent to the classification network. The classification model we adopted takes ResNet50 as the baseline model. On this basis， two different kinds of attention， i.e.， SE and SK， are added. SK convolution replaces 3 × 3 convolution at every bottleneck. Thus， more image features can be extracted at the convolutional layer. Meanwhile SE belongs to channel attention， and each channel can be weighted before the pixel value is outputted. Three methods， namely， Gaussian error gradient descent， label smoothing， and partial data enhancement， are introduced to improve the accuracy of the model.

Result

In the same parameter environment， the intersection over union（IoU） value reached 95.58%. Dice coefficient was 93.45%， which was 4%–6% higher than that of the other segmentation models. The binary segmentation image is classified into four categories， and the Accuracy reached 95.24%.

Conclusion

Experiments show that our proposed TransAS-UNet image segmentation method demonstrates good performance and clinical significance which is superior to those of other 2D image medical segmentation methods.

关键词

乳腺癌深度学习医学图像分割TransAS-UNet图像分类

Keywords

breast cancerdeep learningmedical image segmentationTransAS-UNetimage classification

references

Bray F， Ferlay J， Soerjomataram I， Siegel R I， Torre L A and Jemal L. 2018. Global cancer statistics 2018： GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA： A Cancer Journal for Clinicians， 68（6）： 394-424 ［DOI： 10.3322/caac.21492http://dx.doi.org/10.3322/caac.21492］

Cao Y， Xu J R， Lin S， Wei F Y and Hu H. 2019. GCNet： non-local networks meet squeeze-excitation networks and beyond//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul， Korea （South）： IEEE： 1971-1980［DOI： 10.48550/arXiv.1904.11492］.

Cardoso J S， Marques N， Dhungel N， Carneiro G and Bradley A P. 2017. Mass segmentation in mammograms： a cross-sensor comparison of deep and tailored features//Proceedings of 2017 IEEE International Conference on Image Processing （ICIP）， Beijing， China： IEEE： 1737-1741 ［DOI： 10.1109/ICIP.2017.8296579http://dx.doi.org/10.1109/ICIP.2017.8296579］

Chen G P， Dai Y and Zhang J X. 2023. RRCNet： refinement residual convolutional network for breast ultrasound images segmentation. Engineering Applications of Artificial Intelligence， 117： #105601 ［DOI： 10.1016/j.engappai.2022.105601http://dx.doi.org/10.1016/j.engappai.2022.105601］

Chen L C， Papandreou G， Kokkinos I， Murphy K and Yuille A L. 2018. DeepLab： semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence， 40（4）： 834-848 ［DOI： 10.48550/arXiv.1606.00915http://dx.doi.org/10.48550/arXiv.1606.00915］

Cho S W， Baek N R and Park K R. 2022. Deep learning-based multi-stage segmentation method using ultrasound images for breast cancer diagnosis. Journal of King Saud University—Computer and Information Sciences， 34（10）： 10273-10292 ［DOI： 10.1016/j.jksuci.2022.10.020http://dx.doi.org/10.1016/j.jksuci.2022.10.020］

George M J and Sankar S P. 2017. Efficient preprocessing filters and mass segmentation techniques for mammogram images//Proceedings of 2017 IEEE International Conference on Circuits and Systems （ICCS） Thiruvananthapuram， India： IEEE： 408-413 ［DOI： 10.1109/ICCS1.2017.8326032http://dx.doi.org/10.1109/ICCS1.2017.8326032］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

He Q Q， Yang Q J and Xie M H. 2023. HCTNet： a hybrid CNN-Transformer network for breast ultrasound image segmentation. Computers in Biology and Medicine， 155： #106629 ［DOI： 10.1016/j.compbiomed.2023.106629http://dx.doi.org/10.1016/j.compbiomed.2023.106629］

Hou P and Qi Y L. 2021. Automatic region segmentation method of breast tumors based on deep neural networks. Journal of Biomedical Engineering Research， 40（3）： 241-245

侯佩，齐亚莉. 2021. 基于深度神经网络的乳腺肿瘤自动区域分割方法. 生物医学工程研究， 40（3）： 241-245 ［DOI： 10.19529/j.cnki.1672-6278.2021.03.03http://dx.doi.org/10.19529/j.cnki.1672-6278.2021.03.03］

Hu J， Shen L， Albanie S， Sun G and Wu E H. 2017. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence， 42（8）： 2011-2023 ［DOI： 10.4850/arXiv.1709.01507http://dx.doi.org/10.4850/arXiv.1709.01507］

Huang H M， Lin L F， Tong R F， Hu H J， Zhang Q W， Iwamoto Y， Han X H， Chen Y W and Wu J. 2020. UNet 3+： a full-scale connected UNet for medical image segmentation//Proceedings of ICASSP 2020-2020 IEEE International Conference on Acoustics， Speech and Signal Processing （ICASSP）. Barcelona， Spain： IEEE： 1055-1059 ［DOI： 10.1109/ICASSP40776.2020.9053405http://dx.doi.org/10.1109/ICASSP40776.2020.9053405］

Ibtehaz N and Rahman M S. 2020. MultiResUNet： rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Networks， 121： 74-87 ［DOI： 10.1016/j.neunet.2019.08.025http://dx.doi.org/10.1016/j.neunet.2019.08.025］

Iqbal A and Sharif M. 2023. PDF-UNet： a semi-supervised method for segmentation of breast tumor images using a U-shaped pyramid-dilated network. Expert Systems with Applications， 221： #119718 ［DOI： 10.1016/j.eswa.2023.119718http://dx.doi.org/10.1016/j.eswa.2023.119718］

Jai-Andaloussi S， Sekkaki A， Quellec G， Lamard M， Cazuguel G and Roux C. 2013. Mass segmentation in mammograms by using bidimensional emperical mode decomposition BEMD//Proceedings of the 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society （EMBC）， Osaka， Japan： IEEE： 5441-5444 ［DOI： 10.1109/EMBC.2013.6610780http://dx.doi.org/10.1109/EMBC.2013.6610780］

Kaku A， Hegde C V， Huang J， Chung S， Wang X Y， Young M， Radmanesh A， Lui Y W and Razavian N. 2019. DARTS： DenseUnet-based automatic rapid tool for brain segmentation ［EB/OL］. ［2023-03-24］. https://arxiv.org/pdf/1911_05567.pdfhttps://arxiv.org/pdf/1911_05567.pdf

Ke L， He W and Kang Y. 2009. Mass auto-detection in mammogram based on wavelet transform modulus maximum//Proceedings of 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Minneapolis， USA： IEEE： 5760-5763 ［DOI： 10.1109/IEMBS.2009.5332615http://dx.doi.org/10.1109/IEMBS.2009.5332615］

Kirillov A， Mintun E， Ravi N， Mao H Z， Rolland C， Gustafson L， Xiao T T， Whitehead S， Berg A C， Lo W Y， Doll􀅡r P and Girshick R. 2023. Segment anything ［EB/OL］. ［2023-03-24］. https://arxiv.org/pdf/2304.02643.pdfhttps://arxiv.org/pdf/2304.02643.pdf

Xiang L， Wang W H， Hu X L and Yang J. 2019. Selective kernel networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 510-519 ［DOI： 10.1109/CVPR.2019.00060http://dx.doi.org/10.1109/CVPR.2019.00060］

Lin T Y， Doll􀅡r P， Girshick R， He K M， Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition， Honolulu， USA： IEEE： 2117-2125 ［DOI： 10.4850/arXiv.1612.03144http://dx.doi.org/10.4850/arXiv.1612.03144］

Liu Z， Lin Y T， Cao Y， Hu H， Wei Y X， Zhang Z， Lin S and Guo B N. 2021. Swin Transformer： hierarchical vision Transformer using shifted windows ［EB/OL］. ［2023-03-24］. https://arxiv.org/pdf/2103.14030.pdfhttps://arxiv.org/pdf/2103.14030.pdf

Ma J Q， Zhao S M and Kong F H. 2022. Semantic image segmentation by using multi-scale strip pooling and channel attention. Journal of Image and Graphics， 27（12）： 3530-3541

马吉权，赵淑敏，孔凡辉. 2022. 多尺度条形池化与通道注意力的图像语义分割. 中国图象图形学报， 27（12）： 3530-3541 ［DOI： 10.11834/jig.210359http://dx.doi.org/10.11834/jig.210359］

Nelson A D and Krishna S. 2023. An effective approach for the nuclei segmentation from breast histopathological images using star-convex polygon. Procedia Computer Science， 218： 1778-1790 ［DOI： 10.1016/j.procs.2023.01.156http://dx.doi.org/10.1016/j.procs.2023.01.156］

Ronneberger O， Fischer P and Brox T. 2015. U-net： convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich， Germany： Springer： 234-241 ［DOI： 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28］

Ruan X L， Liu Q， Guo Z H and Yan J F. 2022. Research on breast cancer prediction model. Journal of Medical Informatics， 43（5）： 34-39

阮旭凌，刘琦，郭志恒，晏峻峰. 2022. 乳腺癌预测模型构建研究. 医学信息学杂志， 43（05）： 34-39

Saad G， Khadour A and Kanafani Q. 2016. ANN and Adaboost application for automatic detection of microcalcifications in breast cancer. The Egyptian Journal of Radiology and Nuclear Medicine， 47（4）： 1803-1814 ［DOI： 10.1016/j.ejrnm.2016.08020http://dx.doi.org/10.1016/j.ejrnm.2016.08020］

Salih A M and Kamil M Y. 2018. Mammography image segmentation based on fuzzy morphological operations//Proceedings of the 1st Annual International Conference on Information and Sciences （AiCIS）. Fallujah， Iraq： IEEE： 40-44 ［DOI： 10.1109/AiCIS.2018.00020http://dx.doi.org/10.1109/AiCIS.2018.00020］

Sun H， Li C， Liu B Q， Liu Z Y， Wang M Y， Zheng H R， Feng D D and Wang S S. 2020. AUNet： attention-guided dense-upsampling networks for breast mass segmentation in whole mammograms. Physics in Medicine & Biology， 65（5）： #055005 ［DOI： 10.18550/arXiv.18.10151http://dx.doi.org/10.18550/arXiv.18.10151］

Wen K， Jin X， An H， He J and Wang J. 2023. CentroidNet： a light-weight， fast nuclei centroid detection model for breast Ki67 scoring. Journal of Image and Graphics， 28（04）： 1119-1133

文可，金旭，安虹，何杰，王珏. 2023. CentroidNet：轻量快速的乳腺癌Ki67细胞核中心点检测模型. 中国图象图形学报， 28（4）： 1119-1133 ［DOI： 10.11834/jig.211207http://dx.doi.org/10.11834/jig.211207］

Wu H K， Zhang J G， Huang K Q， Liang K M and Yu Y Z. 2019. FastFCN： rethinking dilated convolution in the backbone for semantic segmentation ［EB/OL］. ［2023-03-24］. https://arxiv.org/pdf/1903.11816.pdfhttps://arxiv.org/pdf/1903.11816.pdf

Xiao X， Lian S， Luo Z M and Li S Z. 2018. Weighted res-UNet for high-quality retina vessel segmentation//Proceedings of the 9th International Conference on Information Technology in Medicine and Education （ITME）. Hangzhou， China： IEEE： 327-331 ［DOI： 10.1109/ITME.2018.00080http://dx.doi.org/10.1109/ITME.2018.00080］

Xu L， Song H H and Liu Q S. 2023. Super-resolution reconstruction of binocular image based on multi-level fusion attention network. Journal of Image and Graphics， 28（4）： 1079-1090

徐磊，宋慧慧，刘青山. 2023. 多层次融合注意力网络的双目图像超分辨率重建. 中国图象图形学报， 28（4）： 1079-1090 ［DOI： 10.11834/Jig.211119http://dx.doi.org/10.11834/Jig.211119］

Yang X， Wang R， Zhao D， Yu F H， Heidari A A， Xu Z Z， Chen H L， Algarni A D， Elmannai H and Xu S L. 2023. Multi-level threshold segmentation framework for breast cancer images using enhanced differential evolution. Biomedical Signal Processing and Control， 80（2）： #104373 ［DOI： 10.1016/j.bspc.2022.104373http://dx.doi.org/10.1016/j.bspc.2022.104373］

Zhang Y， Tomuro N， Furst J and Raicu D S. 2010. Image enhancement and edge-based mass segmentation in mammogram//Proceedings Volume 7623， Medical Imaging 2010： Image Processing，11（2）. San Diego， USA： SPIE： 1452-1459 ［DOI： 10.1117/12.844492http://dx.doi.org/10.1117/12.844492］

Zhao H S， Shi J P， Qi X J， Wang X G and Jia J Y. 2017. Pyramid scene parsing network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR），Honolulu， USA： IEEE： 6230-6239 ［DOI： 10.48550/arXiv.162.01105http://dx.doi.org/10.48550/arXiv.162.01105.］

Zhou Z W， Rahman Siddiquee M M， Tajbakhsh N and Liang J M. 2018. UNet++： a nested U-Net architecture for medical image segmentation.Computer Vision and Pattern Recognition ［EB/OL］. ［2023-03-24］. https://arxiv.org/pdf/1807.101165.pdfhttps://arxiv.org/pdf/1807.101165.pdf

文章被引用时，请邮件提醒。

提交

智能交通系统中的车辆标志识别方法综述

融合帧间时序关系的标准胎儿四腔心超声切面自动获取

基于Transformer的脊椎CT图像分割

引入分组注意力的医学图像分割模型

Transformer驱动的图像分类研究进展