跨模态注意力YOLOv5的PET/CT肺部肿瘤检测
Cross-modal attention YOLOv5 PET/CT lung cancer detection
- 2024年29卷第4期 页码:1070-1084
纸质出版日期: 2024-04-16
DOI: 10.11834/jig.230169
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-04-16 ,
移动端阅览
周涛, 叶鑫宇, 赵雅楠, 陆惠玲, 刘凤珍. 2024. 跨模态注意力YOLOv5的PET/CT肺部肿瘤检测. 中国图象图形学报, 29(04):1070-1084
Zhou Tao, Ye Xinyu, Zhao Yanan, Lu Huiling, Liu Fengzhen. 2024. Cross-modal attention YOLOv5 PET/CT lung cancer detection. Journal of Image and Graphics, 29(04):1070-1084
目的
2
肺部肿瘤早期症状不典型易导致错过最佳治疗时间,有效准确的肺部肿瘤检测技术在计算机辅助诊断中变得日益重要,但在肺部肿瘤PET/CT(positron emission computed tomography/computed tomography)多模态影像中,肿瘤与周围组织粘连导致边缘模糊和对比度低,且存在病灶区域小、大小分布不均衡等问题。针对上述问题,提出一种跨模态注意力YOLOv5(cross-modal attention you only look once v5, CA-YOLOv5)的肺部肿瘤检测模型。
方法
2
首先,在主干网络中设计双分支并行的自学习注意力,利用实例归一化学习比例系数,同时利用特征值与平均值之间差值计算每个特征所包含信息量,增强肿瘤特征和提高对比度;其次,为充分学习多模态影像的多模态优势信息,设计跨模态注意力对多模态特征进行交互式学习,其中Transformer用于建模深浅层特征的远距离相互依赖关系,学习功能和解剖信息以提高肺部肿瘤识别能力;最后,针对病灶区域小、大小分布不均衡的问题,设计动态特征增强模块,利用不同感受野的多分支分组扩张卷积和分组可变形卷积,使网络充分高效挖掘肺部肿瘤特征的多尺度语义信息。
结果
2
在肺部肿瘤PET/CT数据集上与其他10种方法进行性能对比,CA-YOLOv5获得了97.37%精度、94.01%召回率、96.36% mAP(mean average precision)和95.67% F1的最佳性能,并且在同设备上训练耗时最短。在LUNA16(lung nodule analysis 16)数据集中本文同样获得了97.52%精度和97.45% mAP的最佳性能。
结论
2
本文基于多模态互补特征提出跨模态注意力YOLOv5检测模型,利用注意力机制和多尺度语义信息,实现了肺部肿瘤检测模型在多模态影像上的有效识别,使模型识别更加准确和更具鲁棒性。
Objective
2
Cancer is the second leading cause of death worldwide, with nearly one in five patients dying from lung cancer. Many cancers have a high chance of cure through early detection and effective therapeutic care. However, atypical early symptoms of lung cancer can easily lead to missed optimal treatment time. Treatment procedures can be utilized to reduce the risk of death with the successful identification of benign and malignant cancer. Manual determination of lung cancer is a time-consuming and error-prone process, and effective and accurate lung cancer detection techniques are becoming increasingly important in computer-aided diagnosis.
Method
2
Computed tomography is a common clinical modality for examining lung conditions by localizing lesion structures through anatomical information, and positron emission computed tomography can reveal the pathophysiological features of lesions by detecting glucose metabolism. Combining positron emission computed tomography (PET)/computed tomography (CT) has been shown to be effective in cases where conventional imaging is inadequate, identifying lesions while pinpointing them, which improves accuracy and clinical value. However, in PET/CT images of lung cancer, adhesion of cancer to surrounding tissues leads to blurred edges and low contrast, and problems such as small lesion areas and uneven size distribution are encountered. A cross-modal attention YOLOv5 (CA-YOLOv5) model for lung cancer detection is proposed in this paper to address the above problems. This model focuses on the following: First, a two-branch parallel self-learning attention is designed in the backbone network to learn the scaling factor using instance normalization and also calculate the amount of information contained in each feature using the difference between feature and average values. Self-learning attention enhances cancer features and improves contrast. Second, cross-modal attention is designed to facilitate the interactive learning of multimodal features to fully learn the multimodal dominant information of 3D multimodal images. Transformer is designed to model the long-range interdependence of deep and shallow layer features and learn key functional and anatomical information to improve lung cancer recognition. Third, a dynamic feature enhancement module is established to address the problem of small lesion areas and uneven size distribution using multibranch grouped dilated and deformable convolution with different sensory fields, enabling networks to mine multiscale semantic information of lung cancer features fully and efficiently.
Result
2
In a comparison test with 10 other methods, CA-YOLOv5 obtained the best performance with 97.37% precision, 94.01% recall, 96.36% mean average precision(mAP), and 95.67% F1 score on the PET/CT dataset of lung cancer, and the training time on the same device is the shortest. Compared with YOLOv5, each index improved by 2.55%, 4.84%, 3.53%, and 3.49%, respectively. On the PR curve with precision and recall as the horizontal and vertical axes, respectively, the curve area of the proposed model is optimal on each category, and the area enclosed under the curve of this model is the largest on the F1 curve with F1 score at high confidence level. The heat map of the proposed model not only identifies all the labels but also focuses on accuracy. In the LUNA16 dataset, the proposed model obtained the highest performance of 97.52% accuracy and 97.45% mAP, and the overall coverage was the largest in the precision-recall(PR) curve.
Conclusion
2
This paper established CA-YOLOv5, a lung cancer detection model. Lightweight and effective self-learning attention mechanisms are designed to enhance cancer features and improve contrast. Transformer is also created at the end of the backbone network to explore the advantages of convolution and self-attention mechanisms and extract local and global information of deep and shallow layer features. Dynamic feature enhancement modules at the feature enhancement neck are constructed to mine multiscale semantic information of lung cancer features fully and efficiently. Experimental results of the two datasets show that the proposed model in this paper has superior lung cancer recognition and strong network characterization capabilities, which effectively improve detection accuracy and reduce leakage rate. Thus, this model effectively facilitates computer-aided diagnosis and improves the efficiency of preoperative preparation. The effectiveness and robustness of the model are further verified using heat map visualization technique and LUNA16 dataset, respectively.
YOLOv5检测自学习注意力跨模态注意力动态特征增强模块PET/CT肺部肿瘤数据集
YOLOv5 detectionself-learning attentioncross-modal attentiondynamic feature enhancement modulePET/CT lung cancer datasets
Basu S, Gupta M, Rana P, Gupta P and Arora C. 2023. RadFormer: transformers with global–local attention for interpretable and accurate gallbladder cancer detection. Medical Image Analysis, 83: #102676 [DOI: 10.1016/j.media.2022.102676http://dx.doi.org/10.1016/j.media.2022.102676]
Cao S Y, Yu B N, Luo L, Zhang R M, Chen S J, Li C G and Shen H L. 2023. PCNet: a structure similarity enhancement method for multispectral and multimodal image registration. Information Fusion, 94: 200-214 [DOI: 10.1016/j.inffus.2023.02.004http://dx.doi.org/10.1016/j.inffus.2023.02.004]
Choi Y and Lee H. 2023. Interpretation of lung disease classification with light attention connected module. Biomedical Signal Processing and Control, 84: #104695 [DOI: 10.1016/j.bspc.2023.104695http://dx.doi.org/10.1016/j.bspc.2023.104695]
Deng W L, Mou Y L, Kashiwa T, Escalera S, Nagai K, Nakayama K, Matsuo Y and Prendinger H. 2020. Vision based pixel-level bridge structural damage detection using a link ASPP network. Automation in Construction, 110: #102973 [DOI: 10.1016/j.autcon.2019.102973http://dx.doi.org/10.1016/j.autcon.2019.102973]
Farhangi M M, Petrick N, Sahiner B, Frigui H, Amini A A and Pezeshk A. 2020. Recurrent attention network for false positive reduction in the detection of pulmonary nodules in thoracic CT scans. Medical physics, 47(5): 2150-2160 [DOI: 10.1002/mp.14076http://dx.doi.org/10.1002/mp.14076]
Ge Z, Liu S T, Wang F, Li Z M and Sun J. 2021. YOLOX: exceeding YOLO series in 2021 [EB/OL]. [2023-04-17]. https://arxiv.org/pdf/2107.08430.pdfhttps://arxiv.org/pdf/2107.08430.pdf
Groheux D. 2022. FDG-PET/CT for primary staging and detection of recurrence of breast cancer. Seminars in Nuclear Medicine, 52(5): 508-519 [DOI: 10.1053/j.semnuclmed.2022.05.001http://dx.doi.org/10.1053/j.semnuclmed.2022.05.001]
Gu Z X, Li Y Y, Luo H C, Zhang C D and Du H Q. 2022. Cross attention guided multi-scale feature fusion for false-positive reduction in pulmonary nodule detection. Computers in Biology and Medicine, 151: #106302 [DOI: 10.1016/j.compbiomed.2022.106302http://dx.doi.org/10.1016/j.compbiomed.2022.106302]
Hermessi H, Mourali O and Zagrouba E. 2021. Multimodal medical image fusion review: theoretical background and recent advances. Signal Processing, 183: #108036 [DOI: 10.1016/j.sigpro.2021.108036http://dx.doi.org/10.1016/j.sigpro.2021.108036]
Huang J C, Wang H and Lu H. 2022. One-stage detectors combining lightweight backbone and multi-scale fusion. Journal of Image and Graphics, 27(12): 3596-3607
黄健宸, 王晗, 卢昊. 2022. 结合轻量化骨干与多尺度融合的单阶段检测器. 中国图象图形学报, 27(12): 3596-3607 [DOI: 10.11834/jig.211028http://dx.doi.org/10.11834/jig.211028]
Huang Y S, Chou P R, Chen H M, Chang Y C and Chang R F. 2022. One-stage pulmonary nodule detection using 3-D DCNN with feature fusion and attention mechanism in CT image. Computer Methods and Programs in Biomedicine, 220: #106786 [DOI: 10.1016/j.cmpb.2022.106786http://dx.doi.org/10.1016/j.cmpb.2022.106786]
Li C Y, Li L L, Geng Y F, Jiang H L, Cheng M, Zhang B, Ke Z D, Xu X M and Chu X X. 2023. YOLOv6 v3.0: a full-scale reloading [EB/OL]. [2023-04-17]. https://arxiv.org/pdf/2301.05586.pdfhttps://arxiv.org/pdf/2301.05586.pdf
Liang J L, Ye G L, Guo J W, Huang Q F and Zhang S H. 2021. Reducing false-positives in lung nodules detection using balanced datasets. Frontiers in Public Health, 9: #671070 [DOI: 10.3389/fpubh.2021.671070http://dx.doi.org/10.3389/fpubh.2021.671070]
Liu Y C, Shao Z R, Teng Y Y and Hoffmann N. 2021. NAM: normalization-based attention module [EB/OL]. [2023-04-17]. https://arxiv.org/pdf/2111.12419.pdfhttps://arxiv.org/pdf/2111.12419.pdf.
Ming Y, Dong X Y, Zhao J H, Chen Z F, Wang H and Wu N. 2022. Deep learning-based multimodal image analysis for cervical cancer detection. Methods: A Companion to Methods in Enzymology, 205: 46-52 [DOI: 10.1016/j.ymeth.2022.05.004http://dx.doi.org/10.1016/j.ymeth.2022.05.004]
Mokni R, Gargouri N, Damak A, Sellami D, Feki W and Mnif Z. 2021. An automatic computer-aided diagnosis system based on the multimodal fusion of breast cancer (MF-CAD). Biomedical Signal Processing and Control, 69: #102914 [DOI: 10.1016/j.bspc.2021.102914http://dx.doi.org/10.1016/j.bspc.2021.102914]
Pan H D, Jiang J and Chen G F. 2020. TDFSSD: top-down feature fusion single shot MultiBox detector. Signal Processing: Image Communication, 89: #115987 [DOI: 10.1016/j.image.2020.115987http://dx.doi.org/10.1016/j.image.2020.115987]
Qin R X, Wang Z Z, Jiang L Y, Qiao K, Hai J J, Chen J, Xu J L, Shi D P and Yan B. 2020. Fine-grained lung cancer classification from PET and CT images based on multidimensional attention mechanism. Complexity, 2020: #6153657 [DOI: 10.1155/2020/6153657http://dx.doi.org/10.1155/2020/6153657]
Schwyzer M, Ferraro D A, Muehlematter U J, Curioni-Fontecedro A, Huellner M W, Von Schulthess G K, Kaufmann P A, Burger I A and Messerli M. 2018. Automated detection of lung cancer at ultralow dose PET/CT by deep neural networks—initial results. Lung Cancer, 126: 170-173 [DOI: 10.1016/j.lungcan.2018.11.001http://dx.doi.org/10.1016/j.lungcan.2018.11.001]
Setio A A A, Traverso A, De Bel T, Berens M S N, Van Den Bogaard C, Cerello P, Chen H, Dou Q, Fantacci M E, Geurts B, Van Der Gugten R, Heng P A, Jansen B, De Kaste M M J, Kotov V, Lin J Y H, Manders J T M C, Sóñora-Mengana A, García-Naranjo J C, Papavasileiou E, Prokop M, Saletta M, Schaefer-Prokop C M, Scholten E T, Scholten L, Snoeren M M, Torres E L, Vandemeulebroucke J, Walasek N, Zuidhof G C A, Ginneken B V and Jacobs C. 2017. Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge. Medical Image Analysis, 42: 1-13 [DOI: 10.1016/j.media.2017.06.015http://dx.doi.org/10.1016/j.media.2017.06.015]
Shen Z Q, Cao P, Yang J Z and Zaiane O R. 2023. WS-LungNet: a two-stage weakly-supervised lung cancer detection and diagnosis network. Computers in Biology and Medicine, 154: #106587 [DOI: 10.1016/j.compbiomed.2023.106587http://dx.doi.org/10.1016/j.compbiomed.2023.106587]
Wang C Y, Bochkovskiy A and Liao H Y M. 2023a. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 7464-7475 [DOI: 10.1109/CVPR52729.2023.00721http://dx.doi.org/10.1109/CVPR52729.2023.00721]
Wang S, Liu Y S and Shi C M. 2023b. Controlling false-positives in automatic lung nodule detection by adding 3D cuboid attention to a convolutional neural network. Biomedical Signal Processing and Control, 85: #104946 [DOI: 10.1016/j.bspc.2023.104946http://dx.doi.org/10.1016/j.bspc.2023.104946]
World Health Organization. 2022. Cancer [EB/OL]. [2023-04-17]. https://www.who.int/news-room/fact-sheets/detail/cancerhttps://www.who.int/news-room/fact-sheets/detail/cancer.
Xiao H G, Liu Q Y and Li L. 2023. MFMANet: multi-feature multi-attention network for efficient subtype classification on non-small cell lung cancer CT images. Biomedical Signal Processing and Control, 84: #104768 [DOI: 10.1016/j.bspc.2023.104768http://dx.doi.org/10.1016/j.bspc.2023.104768]
Xu J, Ren H J, Cai S Z and Zhang X P. 2023. An improved faster R-CNN algorithm for assisted detection of lung nodules. Computers In Biology and Medicine, 153: #106470 [DOI: 10.1016/j.compbiomed.2022.106470http://dx.doi.org/10.1016/j.compbiomed.2022.106470]
Xu Z W, Ren H J, Zhou W and Liu Z C. 2022. ISANET: non-small cell lung cancer classification and detection based on CNN and attention mechanism. Biomedical Signal Processing and Control, 77: #103773 [DOI: 10.1016/j.bspc.2022.103773http://dx.doi.org/10.1016/j.bspc.2022.103773]
Zeng W J, Zhu Y, Shen T, Zeng K and Liu Y L. 2022. Terahertz image detection combining asymmetric feature attention and feature fusion. Journal of Image and Graphics, 27(8): 2496-2505
曾文健, 朱艳, 沈韬, 曾凯, 刘英莉. 2022. 面向非对称特征注意力和特征融合的太赫兹图像检测. 中国图象图形学报, 27(8): 2496-2505 [DOI: 10.11834/jig.210095http://dx.doi.org/10.11834/jig.210095]
Zhang M Y, Kong Z K, Zhu W J, Yan F and Xie C. 2023. Pulmonary nodule detection based on 3D feature pyramid network with incorporated squeeze-and-excitation-attention mechanism. Concurrency and Computation: Practice and Experience, 35(16): #6237 [DOI: 10.1002/cpe.6237http://dx.doi.org/10.1002/cpe.6237]
Zhou T, Liu S, Dong Y L, Bai J and Lu H L. 2023. Parallel decomposition adaptive fusion model: cross-modal image fusion of lung cancer. Journal of Image and Graphics, 28(1): 221-233
周涛, 刘珊, 董雅丽, 白静, 陆惠玲. 2023. 肺部肿瘤跨模态图像融合的并行分解自适应融合模型. 中国图象图形学报, 28(1): 221-233 [DOI: 10.11834/jig.210988http://dx.doi.org/10.11834/jig.210988]
Zhou Y, Jiang H Y, Diao Z S, Tong G Y, Luan Q, Li Y M and Li X N. 2023. MRLA-Net: a tumor segmentation network embedded with a multiple receptive-field lesion attention module in PET-CT images. Computers in Biology and Medicine, 153: #106538 [DOI: 10.1016/j.compbiomed.2023.106538http://dx.doi.org/10.1016/j.compbiomed.2023.106538]
Zhu X K, Lyu S C, Wang X and Zhao Q. 2021. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision Workshops. Montreal, QC, Canada: IEEE: 2778-2788 [DOI: 10.1109/ICCVW54120.2021.00312http://dx.doi.org/10.1109/ICCVW54120.2021.00312]
相关文章
相关作者
相关机构