融合时空特征与时间约束的双模态乳腺肿瘤诊断
Integrating spatiotemporal features and temporal constraints for dual-modal breast tumor diagnosis
- 2025年30卷第1期 页码:268-281
纸质出版日期: 2025-01-16
DOI: 10.11834/jig.240217
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2025-01-16 ,
移动端阅览
李一宸, 陈大力, 郭丁豪, 孙羽. 融合时空特征与时间约束的双模态乳腺肿瘤诊断[J]. 中国图象图形学报, 2025,30(1):268-281.
LI YICHEN, CHEN DALI, GUO DINGHAO, SUN YU. Integrating spatiotemporal features and temporal constraints for dual-modal breast tumor diagnosis. [J]. Journal of image and graphics, 2025, 30(1): 268-281.
目的
2
综合考虑B型超声(B-mode ultrasound, B-US)和对比增强超声(contrast-enhanced ultrasound, CEUS)双模态信息有助于提升乳腺肿瘤诊断的准确性,从而利于提高患者生存率。然而,目前大多数模型只关注B-US的特征提取,忽视了CEUS特征的学习和双模态信息的融合处理。为解决上述问题,提出了一个融合时空特征与时间约束的双模态乳腺肿瘤诊断模型(spatio-temporal feature and temporal-constrained model, STFTCM)。
方法
2
首先,基于双模态信息的数据特点,采用异构双分支网络学习B-US和CEUS包含的时空特征。然后,设计时间注意力损失函数引导CEUS分支关注造影剂流入病灶区的时间窗口,从该窗口期内提取CEUS特征。最后,借助特征融合模块实现双分支网络之间的横向连接,通过将B-US特征作为CEUS分支补充信息的方式,完成双模态特征融合。
结果
2
在收集到的数据集上进行对比实验,STFTCM预测的正确率、敏感性、宏平均F1和AUC(area under the curve)指标均表现优秀,其中预测正确率达88.2%,领先于其他先进模型。消融实验中,时间注意力约束将模型预测正确率提升5.8%,特征融合使得模型诊断正确率相较于单分支模型至少提升2.9%。
结论
2
本文提出的STFTCM能有效地提取并融合处理B-US和CEUS双模态信息,给出准确的诊断结果。同时,时间注意力约束和特征融合模块可以显著地提升模型性能。
Objective
2
Breast cancer ranks first in the incidence of cancer among women worldwide, impacting the health of the female population. Timely diagnosis of breast tumor can offer better treatment opportunities for patients. B-mode ultrasound (B-US) imaging contains rich spatial information such as lesion size and morphology. It is widely used in breast tumor diagnosis because of its advantages of low cost and high safety. On this basis, with the advancement of deep learning technology, some deep learning models have been applied to computer-aided diagnosis of breast tumor diagnosis based on B-US to assist doctors. However, diagnosis based solely on B-US imaging results in lower specificity. Moreover, the performance of models trained exclusively on B-US is limited by the singular modality of information source. Contrast-enhanced ultrasound (CEUS) can provide a second modality of information on top of B-US to improve diagnostic accuracy. CEUS contains rich spatiotemporal information, such as brightness enhancement and vascular distribution in the lesion area, by injecting contrast agents intravenously and capturing the information during the time window when the contrast agent flows into the lesion area. Considering the B-US and CEUS dual-mode information comprehensively can enhance diagnostic accuracy. A model integrating spatiotemporal features and temporal-constrained (STFTCM) for dual-modality breast tumor diagnosis is proposed to effectively utilize dual-modal data for breast tumor diagnosis.
Method
2
STFTCM primarily comprises a heterogeneous dual-branch network, a temporal attention constraint module, and feature fusion modules. On the basis of the characteristics of dual-mode data information dimensions, STFTCM adopts a heterogeneous dual-branch structure to extract the dual-mode feature separately. For the B-US branch, B-US consists of spatial features within the two-dimensional frames of the video, and inter-frame transformations are not prominent. Considering that training 3D convolutional networks on a small dataset tends to result in overfitting due to the larger number of parameters compared with 2D convolutional networks, a 2D network, ResNet-18, is used as the backbone network for feature extraction from a single frame extracted from the video. In contrast, CEUS video frames undergo noticeable transformations during the time window when the contrast agent flows through the lesion area, containing rich spatiotemporal information. Thus, a 3D network, R(2 + 1)D-18, is used as the backbone network for the CEUS branch. The structure is adjusted on the basis of the backbone network to ensure the feature maps extracted from corresponding layers of the dual-branch network have the same dimensions for subsequent dual-mode fusion. On the basis of the aforementioned CEUS branch, the spatiotemporal information in CEUS mainly resides within the time window when the contrast agent flows into the lesion area; thus, guiding the model to focus on this time segment facilitates better learning of CEUS features on a small dataset. To address this issue, a temporal attention loss function is proposed. An analysis of the temporal knowledge of CEUS video shows that the temporal loss function first determines the temporal attention boundary on the basis of the first-order difference of the discrete sequence of CEUS frames luminance values and then establishes a temporal mask. Subsequently, the temporal attention loss function guides the updating of the parameters of the R(2 + 1)D temporal convolutional kernels in the CEUS branch based on the temporal mask, thereby directing the model to focus on the information during the period when the contrast agent flows into the lesion area. Furthermore, a feature fusion module is introduced to fuse dual-mode information and thus improve prediction accuracy by considering information from both B-US and CEUS. Model parameter size is controlled by not setting a separate third feature fusion branch; instead, the feature fusion module facilitates feature fusion between the dual branches through lateral connections. B-US spatial information is incorporated as supplementary data into the CEUS branch. The feature fusion module comprises a spatial feature fusion module and an identity mapping branch. The spatial feature fusion module combines two-dimensional spatial feature maps of B-US and three-dimensional spatiotemporal feature maps of CEUS, while the identity mapping branch prevents loss of the original CEUS features during the fusion process.
Result
2
Comparative and structural ablation experiments are conducted to explore the performance of STFTCM and the effectiveness of the temporal attention constraint and feature fusion modules. Experimental data are obtained from the Shengjing Hospital of China Medical University, comprising 332 ultrasound contrast videos categorized into benign tumors, malignant tumors, and inflammations, with 101, 102, and 129 instances, respectively. Accuracy, sensitivity, specificity, macro-average F1, and area under the curve are used as model performance evaluation metrics. In comparative experiments, STFTCM achieves an accuracy of 0.882, with respective scores of 0.909, 0.870, 0.883, and 0.952 for the other four metrics, outperforming other advanced models. In single-branch model comparison experiments, both the B-US and CEUS branches of STFTCM perform better than other advanced models. Comparative experiments between dual-branch and single-branch models demonstrate the excellent performance of STFTCM. Structural ablation experiment results show that temporal attention loss constraint improved prediction accuracy by 5.8 percentage points, and dual-modal feature fusion enhanced prediction accuracy by at least 2.9 percentage points compared with unimodal predictions, confirming the effectiveness of the temporal attention constraint and feature fusion modules in enhancing model performance. Additionally, visualization of model attention through class activation maps validates that the temporal attention constraint improves the model’s attention in the temporal dimension, guiding better extraction of spatiotemporal information contained in CEUS.
Results
2
from experiments related to the feature fusion module demonstrate that the addition of the identity mapping branch could improve prediction accuracy by 2.9 percentage points, further confirming the rationality of the feature fusion module’s structural design.
Conclusion
2
STFTCM, designed based on prior knowledge, has demonstrated excellent performance for breast tumor diagnosis. The heterogeneous dual-branch structure designed on the basis of the characteristics of dual-mode data effectively extracts B-US and CEUS dual-mode features while reducing model parameters for better parameter optimization on small datasets. The temporal attention loss function constrains the model’s attention in the temporal dimension, guiding the model to focus on information from the time window when the contrast agent flows into the lesion area. Furthermore, the feature fusion module effectively implements lateral connections between dual-branch networks to fuse dual-mode features.
双模态乳腺肿瘤诊断时空特征时间注意力约束对比增强超声(CEUS)B型超声(B-US)
dual-modality breast tumor diagnosisspatiotemporal featuretemporal attention constraintcontrast-enhanced ultrasound(CEUS)B-mode ultrasound(B-US)
Anderson B O, Braun S, Lim S, Smith R A, Taplin S and Thomas D B. 2003. Early detection of breast cancer in countries with limited resources. The Breast Journal, 9(S2): S51-S59 [DOI: 10.1046/j.1524-4741.9.s2.4.xhttp://dx.doi.org/10.1046/j.1524-4741.9.s2.4.x]
Balleyguier C, Opolon P, Mathieu M C, Athanasiou A, Garbay J R, Delaloge S and Dromain C. 2009. New potential and applications of contrast-enhanced ultrasound of the breast: own investigations and review of the literature. European Journal of Radiology, 69(1): 14-23 [DOI: 10.1016/j.ejrad.2008.07.037http://dx.doi.org/10.1016/j.ejrad.2008.07.037]
Belhumeur P N, Hespanha J P and Kriegman D J. 1997. Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7): 711-720 [DOI: 10.1109/34.598228http://dx.doi.org/10.1109/34.598228]
Bertasius G, Wang H and Torresani L. 2021. Is space-time attention all you need for video understanding? [EB/OL]. [2024-04-12]. https://arxiv.org/pdf/2102.05095.pdfhttps://arxiv.org/pdf/2102.05095.pdf
Byra M. 2018. Discriminant analysis of neural style representations for breast lesion classification in ultrasound. Biocybernetics and Biomedical Engineering, 38(3): 684-690 [DOI: 10.1016/j.bbe.2018.05.003http://dx.doi.org/10.1016/j.bbe.2018.05.003]
Chen C, Wang Y, Niu J W, Liu X F, Li Q F and Gong X T. 2021. Domain knowledge powered deep learning for breast cancer diagnosis based on contrast-enhanced ultrasound videos. IEEE Transactions on Medical Imaging, 40(9): 2439-2451 [DOI: 10.1109/TMI.2021.3078370http://dx.doi.org/10.1109/TMI.2021.3078370]
Du J, Wang L, Wan C F, Hua J, Fang H, Chen J and Li F H. 2012. Differentiating benign from malignant solid breast lesions: combined utility of conventional ultrasound and contrast-enhanced ultrasound in comparison with magnetic resonance imaging. European Journal of Radiology, 81(12): 3890-3899 [DOI: 10.1016/j.ejrad.2012.09.004http://dx.doi.org/10.1016/j.ejrad.2012.09.004]
Evans A, Trimboli R M, Athanasiou A, Balleyguier C, Baltzer P A, Bick U, Camps Herrero J, Clauser P, Colin C, Cornford E, Fallenberg E M, Fuchsjaeger M H, Gilbert F J, Helbich T H, Kinkel K, Heywang-Köbrunner S H, Kuhl C K, Mann R M, Martincich L, Panizza P, Pediconi F, Pijnappel R M, Pinker K, Zackrisson S, Forrai G and Sardanelli F. 2018. Breast ultrasound: recommendations for information to women and referring physicians by the European Society of Breast Imaging. Insights into Imaging, 9(4): 449-461 [DOI: 10.1007/s13244-018-0636-zhttp://dx.doi.org/10.1007/s13244-018-0636-z]
Girshick R. 2015. Fast R-CNN//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1440-1448 [DOI: 10.1109/ICCV.2015.169http://dx.doi.org/10.1109/ICCV.2015.169]
Gong R L, Shi J, Zhou W J and Wang C. 2022. Two-stage deep transfer learning for human breast ultrasound computer-aided diagnosis. Journal of Image and Graphics, 27(3): 898-910
贡荣麟, 施俊, 周玮珺, 汪程. 2022. 面向乳腺超声计算机辅助诊断的两阶段深度迁移学习. 中国图象图形学报, 27(3): 898-910 [DOI: 10.11834/jig.210674http://dx.doi.org/10.11834/jig.210674]
Gong X, Yuan S, Xiang Y, Fan L and Zhou H. 2023. Domain knowledge-guided adversarial adaptive fusion of hybrid breast ultrasound data. Computers in Biology and Medicine, 164: #107256 [DOI: 10.1016/j.compbiomed.2023.107256http://dx.doi.org/10.1016/j.compbiomed.2023.107256]
Guo D H, Lu C Y, Chen D L, Yuan J Z, Duan Q M, Xue Z, Liu S X and Huang Y. 2024. A multimodal breast cancer diagnosis method based on knowledge-augmented deep learning. Biomedical Signal Processing and Control, 90: #105843 [DOI: 10.1016/j.bspc.2023.105843http://dx.doi.org/10.1016/j.bspc.2023.105843]
Guo R R, Lu G L, Qin B J and Fei B W. 2018. Ultrasound imaging technologies for breast cancer detection and management: a review. Ultrasound in Medicine and Biology, 44(1): 37-70 [DOI: 10.1016/j.ultrasmedbio.2017.09.012http://dx.doi.org/10.1016/j.ultrasmedbio.2017.09.012]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hooley R J, Scoutt L M and Philpotts L E. 2013. Breast ultrasonography: state of the art. Radiology, 268(3): 642-659 [DOI: 10.1148/radiol.13121606http://dx.doi.org/10.1148/radiol.13121606]
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141 [DOI: 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745]
Huang Z Y, Zhang S W, Pan L, Qing Z W, Zhang Y Y, JrLiu Z W and Ang M H. 2023. Temporally-adaptive models for efficient video understanding [EB/OL]. [2024-04-12]. https://arxiv.org/pdf/2308.05787.pdfhttps://arxiv.org/pdf/2308.05787.pdf
Ilesanmi A E, Chaumrattanakul U and Makhanov S S. 2021. Methods for the segmentation and classification of breast ultrasound images: a review. Journal of Ultrasound, 24(4): 367-382 [DOI: 10.1007/s40477-020-00557-5http://dx.doi.org/10.1007/s40477-020-00557-5]
Lee Y W, Huang C S, Shih C C and Chang R F. 2021. Axillary lymph node metastasis status prediction of early-stage breast cancer using convolutional neural networks. Computers in Biology and Medicine, 130: #104206 [DOI: 10.1016/j.compbiomed.2020.104206http://dx.doi.org/10.1016/j.compbiomed.2020.104206]
Li K C, Wang Y L, Gao P, Song G L, Liu Y, Li H S and Qiao Y. 2022. UniFormer: unified transformer for efficient spatiotemporal representation learning [EB/OL]. [2024-04-12]. https://arxiv.org/pdf/2201.04676.pdfhttps://arxiv.org/pdf/2201.04676.pdf
Lu S Y, Wang S H and Zhang Y D. 2022. SAFNet: a deep spatial attention network with classifier fusion for breast cancer detection. Computers in Biology and Medicine, 148: #105812 [DOI: 10.1016/j.compbiomed.2022.105812http://dx.doi.org/10.1016/j.compbiomed.2022.105812]
Luo L Y, Wang X, Lin Y, Ma X Q, Tan A D, Chan R, Vardhanabhuti V, Chu W C, Cheng K T and Chen H. 2024. Deep learning in breast cancer imaging: a decade of progress and future directions. IEEE Reviews in Biomedical Engineering, [DOI: 10.1109/RBME.2024.3357877http://dx.doi.org/10.1109/RBME.2024.3357877]
Masud R, Al-Rei M and Lokker C. 2019. Computer-aided detection for breast cancer screening in clinical settings: scoping review. JMIR Medical Informatics, 7(3): #e12660 [DOI: 10.2196/12660http://dx.doi.org/10.2196/12660]
Mo Y H, Han C, Liu Y, Liu M, Shi Z W, Lin J T, Zhao B C, Huang C W, Qiu B J, Cui Y F, Wu L, Pan X P, Xu Z Y, Huang X M, Li Z H, Liu Z Y, Wang Y and Liang C H. 2023. HoVer-trans: anatomy-aware HoVer-transformer for ROI-free breast cancer diagnosis in ultrasound images. IEEE Transactions on Medical Imaging, 42(6): 1696-1706 [DOI: 10.1109/TMI.2023.3236011http://dx.doi.org/10.1109/TMI.2023.3236011]
Moon W K, Lee Y W, Ke H H, Lee S H, Huang C S and Chang R F. 2020. Computer-aided diagnosis of breast ultrasound images using ensemble learning from convolutional neural networks. Computer Methods and Programs in Biomedicine, 190: #105361 [DOI: 10.1016/j.cmpb.2020.105361http://dx.doi.org/10.1016/j.cmpb.2020.105361]
Qiu Z F, Yao T and Mei T. 2017. Learning spatio-temporal representation with pseudo-3D residual networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5534-5542 [DOI: 10.1109/ICCV.2017.590http://dx.doi.org/10.1109/ICCV.2017.590]
Siegel R L, Giaquinto A N and Jemal A. 2024. Cancer statistics, 2024. CA: A Cancer Journal for Clinicians, 74(1): 12-49 [DOI: 10.3322/CAAC.21820http://dx.doi.org/10.3322/CAAC.21820]
Singh S P, Wang L P, Gupta S, Goli H, Padmanabhan P and Gulys B. 2020. 3D deep learning on medical images: a review. Sensors, 20(18): #5097 [DOI: 10.3390/s20185097http://dx.doi.org/10.3390/s20185097]
Tran D, Bourdev L, Fergus R, Torresani L and Paluri M. 2015. Learning spatiotemporal features with 3D convolutional networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 4489-4497 [DOI: 10.1109/ICCV.2015.510http://dx.doi.org/10.1109/ICCV.2015.510]
Tran D, Wang H, Torresani L, Ray J, LeCun Y and Paluri M. 2018. A closer look at spatiotemporal convolutions for action recognition//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 6450-6459 [DOI: 10.1109/CVPR.2018.00675http://dx.doi.org/10.1109/CVPR.2018.00675]
Wubulihasimu M, Maimaitusun M, Xu X L, Liu X D and Luo B M. 2018. The added value of contrast-enhanced ultrasound to conventional ultrasound in differentiating benign and malignant solid breast lesions: a systematic review and meta-analysis. Clinical Radiology, 73(11): 936-943 [DOI: 10.1016/j.crad.2018.06.004http://dx.doi.org/10.1016/j.crad.2018.06.004]
Yang Z Q, Gong X, Zhu D and Guo Y. 2020. Cooperative suppression network for bimodal data in breast cancer classification. Journal of Image and Graphics, 25(10): 2218-2228
杨子奇, 龚勋, 朱丹, 郭颖. 2020. 乳腺超声双模态数据的协同约束网络. 中国图象图形学报, 25(10): 2218-2228 [DOI: 10.11834/jig.200246http://dx.doi.org/10.11834/jig.200246]
Zhao X, Gong X, Fan L and Luo J. 2022. Attention-based networks of human breast bimodal ultrasound imaging classification. Journal of Image and Graphics, 27(3): 911-922
赵绪, 龚勋, 樊琳, 罗俊. 2022. 结合注意力机制的乳腺双模态超声分类网络. 中国图象图形学报, 27(3): 911-922 [DOI: 10.11834/jig.210370http://dx.doi.org/10.11834/jig.210370]
Zhou B L, Andonian A, Oliva A and Torralba A. 2018. Temporal relational reasoning in videos//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 831-846 [DOI: 10.1007/978-3-030-01246-5_49http://dx.doi.org/10.1007/978-3-030-01246-5_49]
Zhu Y, Li X Y, Liu C H, Zolfaghari M, Xiong Y J, Wu C R, Zhang Z, Tighe J, Manmatha R and Li M. 2020. A comprehensive study of deep video action recognition [EB/OL]. [2024-04-12]. https://arxiv.org/pdf/2012.06567.pdfhttps://arxiv.org/pdf/2012.06567.pdf
Zonderland H M. 2000. The role of ultrasound in the diagnosis of breast cancer. Seminars in Ultrasound, CT and MRI, 21(4): 317-324 [DOI: 10.1016/S0887-2171(00)90026-Xhttp://dx.doi.org/10.1016/S0887-2171(00)90026-X]
相关文章
相关作者
相关机构