结合局部全局特征与多尺度交互的三维多器官分割网络

柴静雯; 李安康; 张浩; 马泳; 梅晓光; 马佳义

doi:10.11834/jig.230356

图像分析和识别 | 浏览量 : 0 下载量: 6 CSCD: 0

PDF
导出
分享
收藏
专辑

结合局部全局特征与多尺度交互的三维多器官分割网络
3D multi-organ segmentation network combining local and global features and multi-scale interaction
2024年29卷第3期页码：655-669
纸质出版日期： 2024-03-16 ，
DOI： 10.11834/jig.230356
稿件说明：

移动端阅览

柴静雯，李安康，张浩，马泳，梅晓光，马佳义. 2024. 结合局部全局特征与多尺度交互的三维多器官分割网络. 中国图象图形学报， 29(03):0655-0669

Chai Jingwen， Li Ankang， Zhang Hao， Ma Yong， Mei Xiaoguang， Ma Jiayi. 2024. 3D multi-organ segmentation network combining local and global features and multi-scale interaction. Journal of Image and Graphics， 29(03):0655-0669
柴静雯，李安康，张浩，马泳，梅晓光，马佳义. 2024. 结合局部全局特征与多尺度交互的三维多器官分割网络. 中国图象图形学报， 29(03):0655-0669 DOI： 10.11834/jig.230356.

Chai Jingwen， Li Ankang， Zhang Hao， Ma Yong， Mei Xiaoguang， Ma Jiayi. 2024. 3D multi-organ segmentation network combining local and global features and multi-scale interaction. Journal of Image and Graphics， 29(03):0655-0669 DOI： 10.11834/jig.230356.

摘要

目的

高度适形放射治疗是常用的癌症治疗方法，该方法的有效性依赖于对癌组织和周边多个危及器官（organ at risk，OAR）解剖结构的精确刻画，因此研究三维图像多器官的高精度自动分割具有重要意义。以视觉Transformer（vision Transformer，ViT）和卷积神经网络（convolutional neural network，CNN）结合为代表的三维医学图像分割方法表现出了丰富的应用优势。然而，这类方法往往忽略同一尺度内和不同尺度间的信息交互，使得CNN和ViT特征的提取和融合受限。本文提出一种端到端多器官分割网络LoGoFUNet（local-global-features fusion UNet），旨在应对现有方法的缺陷。

方法

首先，针对单一器官分割，提出在同一尺度下并行提取并融合CNN和ViT特征的LoGoF（local-global-features fusion）编码器，并构建了一个端到端的三维医学图像分割多尺度网络M0。此外，考虑到器官内部以及器官之间的相互关系，该方法在M0网络的基础上设计并引入了多尺度交互（multi-scale interaction，MSI）模块和注意力指导（attention guidance，AG）结构，最终形成了LoGoFUNet。

结果

在Synapse数据集和SegTHOR（segmentation of thoracic organs at risk）数据集上，本文方法相比于表现第2的模型， DSC（Dice similarity cofficient）指标分别提高了2.94%和4.93%，而HD95（Hausdorff distance_95）指标则分别降低了8.55和2.45，切实提升了多器官分割任务的性能表现。在ACDC（automatic cardiac diagnosis challenge）数据集上，3D分割方法的适用性大多较差，但LoGoFUNet依然得到了比2D先进方法更好的结果，说明其对数据集的适应能力更强。

结论

该方法的分割模型综合尺度内和尺度间的信息交互，具有更好的分割结果，且在数据集上的泛化性更好。

Abstract

Objective

Highly conformal radiotherapy is a widely adopted cancer treatment modality requiring meticulous characterization of cancer tissues and comprehensive delineation of the surrounding anatomical structures. The efficacy and safety of this technique depend generally on the ability to precisely target the tumor， necessitating a thorough understanding of the corresponding organ-at-risk anatomy. Thus， accurate and detailed depiction of the neoplastic and adjacent normal tissues using advanced imaging techniques is critical in optimizing the outcomes of highly conformal radiotherapy. Given the current inadequacy of conventional segmentation methods in achieving accurate and efficient delineation of multi-organ structures from 3D medical images， there exists a promising opportunity for research on developing precise and automated segmentation techniques using deep learning approaches. By leveraging the capacity of deep neural networks （DNNs） to learn complex hierarchical representations from vast amounts of labeled data， this technique can facilitate the identification and extraction of specific features and patterns from medical images， leading to considerably reliable and efficient segmentation outcomes. This method could significantly enhance the clinical utility of imaging data in various diagnostic and therapeutic applications， including but not limited to radiation therapy planning， surgical navigation， and disease assessment. Over the past few years， there has been increasing interest in exploring the benefits of integrating vision Transformer （ViT） with convolutional neural networks （CNNs） to enhance the quality and accuracy of semantic segmentation tasks. One promising research direction that has emerged involves addressing the issue of multi-scale representation， which is critical for achieving robust and precise segmentation results on various medical imaging datasets. However， current state-of-the-art methods have failed to fully maximize the potential of multi-scale interaction between CNNs and ViTs. For example， some methods completely disregard multi-scale structures or achieve it by limiting the computational scope of ViTs. Other methods rely solely on CNN or ViT at the same scale， disregarding their complementary advantages. In addition， the existing multi-scale interaction methods often neglect the spatial association between two-dimensional slices， resulting in poor performance in processing volume data. Therefore， further research is needed to solve the aforementioned problems.

Method

This research aims to address the limitations of existing methods for multi-organ segmentation in 3D medical images by proposing a new approach. By recognizing the importance of simultaneously determining local and global features at the same scale， a universal feature encoder known as the LoGoF module is introduced for use in multi-organ segmentation networks. This method enables the creation of an end-to-end 3D medical image multi-organ segmentation network （denoted as M0）， which leverages the LoGoF module. To further enhance the model’s ability to determine complex relationships between organs at different scales， a multi-scale interaction module and an attention-guided structure are incorporated into M0. These novel techniques introduce spatial priors into the features extracted at different scales， enabling M0 to accurately perceive inter-organ relationships and identify organ boundaries. By leveraging the preceding advanced components， the proposed model， called LoGoFUNet， enables robust and efficient multi-organ segmentation in 3D medical images. Overall， this approach represents a significant step forward in advancing the accuracy and efficiency of multi-organ segmentation in clinical applications.

Result

In experiments conducted on two well-known medical imaging datasets （i.e.， Synapse and SegTHOR）， LoGoFUNet demonstrated impressive gains in accuracy over the second-best performing model. Compared with the runner-up， LoGoFUNet achieved a 2.94% improvement in the Dice similarity coefficient on the Synapse dataset， and a 4.93% improvement on the SegTHOR dataset. Furthermore， the 95th percentile Hausdorff distance index showed a significant decrease of 8.55 and 2.45 on Synapse and SegTHOR， respectively， indicating an overall improvement in multi-organ segmentation performance. On the ACDC dataset， the applicability of the 3D segmentation method is mostly poor， but LoGoFUNet still obtains better results than the 2D advanced method. This result indicates LoGoFUNet’s superior adaptability and versatility to different types of datasets. These findings suggest that LoGoFUNet is a highly competitive and robust framework for accurate multi-organ segmentation in various clinical settings. This study conducts further ablation experiments to provide additional evidence supporting the effectiveness of and justification for LoGoFUNet. These experiments serve to verify the role and contribution of each of the proposed components， including the LoGoF encoder， multi-scale interaction module， and attention-guidance structure， in achieving the superior segmentation performance observed with LoGoFUNet. By systematically removing and evaluating the impact of each component on segmentation accuracy， these experiments confirm that the proposed module design is rational and effective. Thus， results of the ablation experiments further reinforce the value and potential clinical significance of adopting the LoGoFUNet framework for multi-organ segmentation in 3D medical imaging applications.

Conclusion

The experimental evaluation of the proposed segmentation model suggests that it effectively integrates information exchange within and between different scales. This outcome leads to improved segmentation performance and superior generalization capabilities on the dataset. By facilitating the interaction of multi-scale representations and leveraging novel techniques， such as intra- and inter-scale information exchange mechanisms， this approach enables the model to accurately determine complex spatial relationships and produce high-quality segmentations across a range of 3D medical imaging datasets. Findings highlight the importance of multi-scale features and information exchange in achieving robust and accurate medical image segmentation results. Lastly， results suggest that the proposed framework could provide significant benefits in a variety of clinical applications.

关键词

多器官分割深度神经网络（DNN）视觉Transformer（ViT）局部全局特征多尺度交互（MSI）

Keywords

multi-organ segmentationdeep neural network（DNN）vision Transformer（ViT）local-global featuremulti-scale interaction（MSI）

references

Al-Shabi M， Shak K and Tan M. 2021. 3D axial-attention for lung nodule classification. International Journal of Computer Assisted Radiology and Surgery， 16（8）： 1319-1324 ［DOI： 10.1007/s11548-021-02415-zhttp://dx.doi.org/10.1007/s11548-021-02415-z］

Cao H， Wang Y Y， Chen J， Jiang D S， Zhang X P， Tian Q and Wang M N. 2023. Swin-UNet： UNet-like pure Transformer for medical image segmentation//Proceedings of 2023 European Conference on Computer Vision. Tel Aviv， Israel： Springer： 205-218 ［DOI： 10.1007/978-3-031-25066-8_9http://dx.doi.org/10.1007/978-3-031-25066-8_9］

Chen H Y， Gao J Y， Zhao D， Wang H Z， Song H and Su Q H. 2021. Review of the research progress in deep learning and biomedical image analysis till 2020. Journal of Image and Graphics， 26（3）： 475-486

陈弘扬，高敬阳，赵地，汪红志，宋红，苏庆华. 2021. 深度学习与生物医学图像分析2020年综述. 中国图象图形学报， 26（3）： 475-486 ［DOI： 10.11834/jig.200351http://dx.doi.org/10.11834/jig.200351］

Chen J N， Lu Y Y， Yu Q H， Luo X D， Adeli E， Wang Y， Lu L， Yuille A L and Zhou Y Y. 2021. Transunet： Transformers make strong encoders for medical image segmentation ［EB/OL］. ［2023-06-08］. https://arxiv.org/pdf/2102.04306.pdfhttps://arxiv.org/pdf/2102.04306.pdf

Dosovitskiy A， Beyer L， Kolesnikov A， Weissenborn D， Zhai X H， Unterthiner T， Dehghani M， Minderer M， Heigold G， Gelly S， Uszkoreit J and Houlsby N. 2021. An image is worth 16 × 16 words： Transformers for image recognition at scale ［EB/OL］. ［2023-06-08］.https://arxiv.org/pdf/2010.11929.pdfhttps://arxiv.org/pdf/2010.11929.pdf

Fu S H， Lu Y Y， Wang Y， Zhou Y Y， Shen W， Fishman E and Yuille A. 2020. Domain adaptive relational reasoning for 3D multi-organ segmentation//Proceedings of the 23rd Medical Image Computing and Computer Assisted Intervention. Lima， Peru： Springer： 656-666 ［DOI： 10.1007/978-3-030-59710-8_64http://dx.doi.org/10.1007/978-3-030-59710-8_64］

Han K， Wang Y H， Tian Q， Guo J Y， Xu C J and Xu C. 2020. GhostNet： more features from cheap operations//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 1577-1586 ［DOI： 10.1109/cvpr42600.2020.00165http://dx.doi.org/10.1109/cvpr42600.2020.00165］

Hatamizadeh A， Nath V， Tang Y C， Yang D， Roth H R and Xu D G. 2022a. Swin UNETR： swin Transformers for semantic segmentation of brain tumors in MRI images ［EB/OL］. ［2023-06-08］. https://arxiv.org/pdf/2201.01266.pdfhttps://arxiv.org/pdf/2201.01266.pdf

Hatamizadeh A， Tang Y C， Nath V， Yang D， Myronenko A， Landman B， Roth H R and Xu D G. 2022b. UNETR： Transformers for 3D medical image segmentation//Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa， USA： IEEE： 1748-1758 ［DOI： 10.1109/wacv51458.2022.00181http://dx.doi.org/10.1109/wacv51458.2022.00181］

Hu J， Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 7132-7141 ［DOI： 10.1109/cvpr.2018.00745http://dx.doi.org/10.1109/cvpr.2018.00745］

Huang X H， Deng Z F， Li D D， Yuan X G and Fu Y. 2023. MISSFormer： an effective Transformer for 2D medical image segmentation. IEEE Transactions on Medical Imaging， 42（5）： 1484-1494 ［DOI： 10.1109/tmi.2022.3230943http://dx.doi.org/10.1109/tmi.2022.3230943］

Lambert Z， Petitjean C， Dubray B and Kuan S. 2020. SegTHOR： segmentation of thoracic organs at risk in CT images//Proceedings of the 10th International Conference on Image Processing Theory， Tools and Applications. Paris， France： IEEE： 1-6 ［DOI： 10.1109/ipta50016.2020.9286453http://dx.doi.org/10.1109/ipta50016.2020.9286453］

Liu Z， Mao H Z， Wu C Y， Feichtenhofer C， Darrell T and Xie S N. 2022. A convNet for the 2020s//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 11966-11976 ［DOI： 10.1109/CVPR52688.2022.01167http://dx.doi.org/10.1109/CVPR52688.2022.01167］

Milletari F， Navab N and Ahmadi S A. 2016. V-Net： fully convolutional neural networks for volumetric medical image segmentation//Proceedings of the 4th International Conference on 3D Vision （3DV）. Stanford， USA： IEEE： 565-571 ［DOI： 10.1109/3dv.2016.79http://dx.doi.org/10.1109/3dv.2016.79］

Oktay O， Schlemper J， Le Folgoc L， Lee M， Heinrich M， Misawa K， Mori K， McDonagh S， Hammerla N Y， Kainz B， Glocker B and Rueckert D. 2018. Attention U-Net： learning where to look for the pancreas//Medical Imaging with Deep Learning. Amsterdam， the Netherlands： OpenReview.net

Ronneberger O， Fischer P and Brox T. 2015. U-Net： convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich， Germany： Springer： 234-241 ［DOI： 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28］

Shamshad F， Khan S， Zamir S W， Khan M H， Hayat M， Khan F S and Fu H Z. 2023. Transformers in medical imaging： a survey. Medical Image Analysis， 88： #102802 ［DOI： 10.1016/j.media.2023.102802http://dx.doi.org/10.1016/j.media.2023.102802］

Sheng R J， Qian C Y， Xu L and Huang J. 2023. Clinical study of adaptive intensity-modulated radiotherapy for locally advanced cervical cancer. China Modern Doctor， 61（2）： 40-43

盛荣军，钱春艳，徐亮，黄洁. 2023. 自适应调强放射治疗对局部晚期宫颈癌的应用研究. 中国现代医生， 61（2）： 40-43 ［DOI： 10.3969/j.issn.1673-9701.2023.02.009http://dx.doi.org/10.3969/j.issn.1673-9701.2023.02.009］

Woo S， Park J， Lee J Y and Kweon I S. 2018. CBAM： convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 3-19 ［DOI： 10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1］

Yan X Y， Tang H， Sun S L， Ma H Y， Kong D Y and Xie X H. 2022. AFTer-UNet： axial fusion Transformer UNet for medical image segmentation//Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa， USA： IEEE： 3270-3280 ［DOI： 10.1109/wacv51458.2022.00333http://dx.doi.org/10.1109/wacv51458.2022.00333］

Zhou T， Dong Y L， Huo B Q， Liu S and Ma Z J. 2021. U-Net and its applications in medical image segmentation： a review. Journal of Image and Graphics， 26（9）： 2058-2077

周涛，董雅丽，霍兵强，刘珊，马宗军. 2021. U-Net网络医学图像分割应用综述. 中国图象图形学报， 26（9）： 2058-2077 ［DOI： 10.11834/jig.200704http://dx.doi.org/10.11834/jig.200704］

文章被引用时，请邮件提醒。

提交

基于多层级并行神经网络的多模态脑肿瘤图像分割框架

视觉Transformer预训练模型的胸腔X线影像多标签分类

融合上下文和多尺度特征的胸部多器官分割