YOLO-SF-TV:经颅超声图像三脑室检测模型
YOLO-SF-TV: transcranial ultrasound images of the third ventricle detection model
- 2024年 页码:1-12
网络出版日期: 2024-10-16
DOI: 10.11834/jig.240293
移动端阅览
浏览全部资源
扫码关注微信
网络出版日期: 2024-10-16 ,
移动端阅览
万奥,高红铃,周晓等.YOLO-SF-TV:经颅超声图像三脑室检测模型[J].中国图象图形学报,
Wan Ao,Gao Hongling,Zhou Xiao,et al.YOLO-SF-TV: transcranial ultrasound images of the third ventricle detection model[J].Journal of Image and Graphics,
目的
2
经颅超声成像技术作为高效率、低成本且无创的诊断手段,已逐步应用于帕金森病患者认知功能障碍诊断。由于经颅超声图像信噪比低、成像质量差、目标组织复杂且相似度高,需要依赖专业医生手动检测。但是人工检测不仅费时费力,还可能因为操作者的主观因素影响,造成检测结果出现差异性。针对这一问题,本文提出了一种基于Swin Transformer和多尺度深度特征融合的YOLO-SF-TV(YOLO network based on Swin Transformer and multi-scale deep feature fusion for third ventricle)模型用于经颅超声图像三脑室检测,以提高临床检测准确率,辅助医生进行早期诊断。
方法
2
YOLO-SF-TV模型在YOLOv8的基础上使用基于窗口注意力的Swin Transformer作为模型特征提取网络,并引入空间金字塔池化合模块SPP-FCM(spatial pyramid pooling fast incorporating CSPNet and multiple attention mechanisms)扩大网络感受野,并增强多尺度特征融合能力。在网络的多尺度特征融合部分结合深度可分离卷积和多头注意力机制,提出了PAFPN-DM(path aggregation and feature pyramid network with depthwise separable convolution)模块,并对主干特征输出层增加多头注意力机制,以提高网络对不同尺度特征图中全局和局部重要信息的理解能力。与此同时,将传统卷积替换为深度可分离卷积模块,通过对每个通道单独卷积提高网络对不同通道敏感性,以保证模型准确度的同时降低训练参数和难度,增强模型的泛化能力。
结果
2
实验在本文收集的经颅超声三脑室图像数据及对应标签的数据集下进行,并与典型的目标检测模型对比实验。结果表明,本文提出的YOLO-SF-TV在经颅超声三脑室目标上mAP能够达到98.69%,相比于YOLOv8提升了2.12%,并与其他典型模型相比检测精度达到最优。
结论
2
本文提出的YOLO-SF-TV模型在经颅超声图像三脑室检测问题上表现优秀,SPP-FCM模块和PAFPN-DM模块可以增强模型检测能力,提高模型泛化性和鲁棒性,同时本文制作的数据集将有助于推动经颅超声三脑室图像检测问题的研究。
Objective
2
Cognitive impairment is the most dangerous non motor symptoms in patients with Parkinson's disease. About 25% - 30% of patients will suffer from this disease every year. This disease not only seriously affects the quality of life of patients with Parkinson's disease(PD), but also increases the risk of death. However, the accuracy of clinical diagnosis of PD cognitive impairment is still limited. The proportion of PD patients diagnosed before the age of 50 is less than 4%. Until recent years, some scholars have proposed that the detection of the third ventricle by transcranial ultrasound imaging technology can assist doctors in the diagnosis of PD patients' cognitive impairment. As a rapid, noninvasive and low-cost detection method, transcranial ultrasound imaging technology has been gradually applied to the diagnosis of cognitive dysfunction in patients with PD, helping doctors find the disease in time and treat it as soon as possible. Due to the low signal-to-noise ratio of transcranial ultrasound images, poor imaging quality, and complexity and similarity of target tissues, it is necessary to rely on manual detection by specialized physicians. However, manual detection is not only time-consuming and labor-intensive, but may also result in variability of detection results due to the influence of subjective factors of the operator. In recent years, deep learning technology has been increasingly integrated with the medical field, especially the CAD system based on deep learning has been used to diagnose PD and achieved good results. In this paper, a YOLO-SF-TV network based on Swin Transformer and multi-scale feature fusion is proposed for transcranial ultrasound image third ventricle detection to assist physicians in early diagnosis.
Method
2
The experiment acquired 2400 transcranial ultrasound images of the third ventricle and the corresponding labels to form a dataset, and the third ventricle region in each image was manually labeled by a professional. The YOLO-SF-TV network is designed to consist of Backbone, Neck, and Head components, whose roles are used to extract image features, fuse image features, and detect and classify targets, respectively. The algorithm in this paper is based on YOLOv8, and uses the window-attention based Swin Transformer to improve the model backbone network and strengthen the network's ability to model global information. At the same time, the SPP-FCM, a spatial pyramid pooling module, is connected to the Swin Transformer network to enhance network sensibility and integrate multi-scale information. The SPP-FCM structure combines the characteristics of the CSPC structure in YOLOv7, while targeting the introduction of a multi-head attention mechanism (MHAM) in the multilevel pooling part, which reduces the sensitivity of the model to noise and outliers in the process of extracting multidimensional features. In the multi-scale feature fusion PAFPN part of the network, the PAFPN-DM module is proposed by combining depthwise separable convolution (DCOW) and multi-head attention mechanism, and the multi-head attention mechanism is added to the backbone feature output layer. In order to improve the network's ability to understand the global and local important information in different scale feature maps. At the same time, the traditional convolution is replaced with a depth-separable convolution module, which improves the sensitivity of the network to different channels by convolving each channel individually, in order to ensure the accuracy of the model while reducing the training parameters and difficulty, and enhancing the generalization ability of the model.
Result
2
In order to validate the performance of the different networks, a five-fold cross-validation evaluation was performed on the dataset, which was designed to randomly divide the dataset into equal quintuples, of which four at a time were used as the training set and the remaining one as the test set. The training input image was resized to 640×640 and the training dataset was expanded using data enhancement methods such as random flip, random angle rotation and Mosaic. The initial learning rate for model training was set to 0.001, and the learning rate decayed to 0.1 times of the original every 50 epochs, with a momentum of 0.9, a decay coefficient of 0.0005, and a batch size of 8. GeForce RTX 3090 was used for the GPU in the experiments, and the mAP metrics were used as a measure for detecting network performance under the Ubuntu 20.04 operating system and PyTorch framework. The experimental results show that the YOLO-SF-TV algorithm is able to achieve 98.69% detection accuracy on transcranial ultrasound third ventricle targets, which improves the detection accuracy by 2.12% compared to the YOLOv8 model and optimizes the detection accuracy compared to other typical models.
Conclusion
2
The YOLO-SF-TV model proposed in this paper performs excellently in the problem of third ventricle detection in transcranial ultrasound images. The SPP-FCM module and PAFPN-DM module used can enhance the model detection capability and improve the model generalization and robustness. The dataset produced in this paper will help to promote the research on the problem of third ventricle image detection in transcranial ultrasound.
经颅超声成像计算机辅助诊断三脑室深度学习YOLOv8Swin Transformer
transcranial ultrasound imagingcomputer assisted diagnosisthird ventricledeep learningYOLOv8Swin Transformer
Baccouche A, Garcia-Zapirain B, Zheng Y F and Elmaghraby A S. 2022. Early detection and classification of abnormality in prior mammograms using image-to-image translation and YOLO techniques. Computer Methods and Programs in Biomedicine, 221:106884[DOI:10.1016/j.cmpb.2022.106884http://dx.doi.org/10.1016/j.cmpb.2022.106884]
Bochkovskiy A, Wang C Y and Liao H Y M. 2020. Yolov4: Optimal speed and accuracy of object detection[EB/OL]. [2020-04-23]. https://arxiv.org/pdf/2004.10934.pdfhttps://arxiv.org/pdf/2004.10934.pdf
Cao H, Wang Y Y, Chen J, Jiang D S, Zhang X P, Tian Q and Wang M N. 2022. Swin-unet: Unet-like pure transformer for medical image segmentation//European conference on computer vision. Cham: Springer Nature Switzerland, 2022:205-218[DOI:10. 1007/978-3-031-25066-8_9http://dx.doi.org/10.1007/978-3-031-25066-8_9]
Chen S, Sun P, Song Y and Luo P. Diffusiondet: Diffusion model for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE: 19830-19843[DOI:10.48550/arXiv.2211.09788http://dx.doi.org/10.48550/arXiv.2211.09788]
Croitoru F A, Hondru V, Ionescu R T and Shah M. 2023, Diffusion models in vision: A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10850-10869[DOI:10.1109 /TPAMI.2023. 3261988http://dx.doi.org/10.1109/TPAMI.2023.3261988]
Fethi G, Pierre V and Su R. 2023. Efficient brain tumor segmentation using Swin transformer and enhanced local self-attention, International Journal of Computer Assisted Radiology and Surgery, 2023: 1-9[DOI:10.1007/s11548-023-03024-8http://dx.doi.org/10.1007/s11548-023-03024-8]
Fu X Y, Zhang Y. C, Ding C W, Yang M, Song X, Wang C S, Chen X F, Zhang Y, Sheng Y J, Mao P, Mao C J and Liu C F. 2021. Association between homocysteine and third ventricle dilatation, mesencephalic area atrophy in Parkinson’s disease with cognitive Impairment. Journal of Clinical Neuroscience, 90:273-278 [DOI:10.1016/j.jocn.2021.06.006http://dx.doi.org/10.1016/j.jocn.2021.06.006]
Gao H L, Qu Y, Chen S C, Yang Q M, Li J Y, Tao A Y, Mao Z J and Xue Z. 2024. Third ventricular width by transcranial sonography is associated with cognitive impairment in Parkinson's disease[J]. CNS Neuroscience & Therapeutics, 30(2):e14360 [DOI: 10.1111/cns.14360http://dx.doi.org/10.1111/cns.14360]
Ge Z, Liu S, Wang F, Li Z and Sun J. 2021. Yolox: Exceeding yolo series in 2021[EB/OL]. [2021-08-06]. https://arxiv.org/pdf/2107. 08430.pdfhttps://arxiv.org/pdf/2107.08430.pdf
Guo H, Yang X, Wang N and Gao X. 2021. A CenterNet++ model for ship detection in SAR images. Pattern Recognition, 112: 107787[DOI:10.1016/j.patcog.2020.107787http://dx.doi.org/10.1016/j.patcog.2020.107787]
Hao W, Cai H, Zuo T, Jia Z, Wang Y and Chen X. 2024. Intravascular ultrasound image segmentation fusing Transformer branch and topology enforcement[J/OL]. Laser & Optoelectronics Pro-gress.
郝文月, 蔡怀宇, 左廷涛, 贾忠伟, 汪毅, 陈晓冬. 2024. 融合Transformer分支和拓扑强制的IVUS图像分割方法[J/OL]. 激光与光电子学进展) [2024-04-15]. http://kns. cnki.net/kcms/detail/31.1690.TN.20231009.1427.030.htmlhttp://kns.cnki.net/kcms/detail/31.1690.TN.20231009.1427.030.html
Heravi F S, Naseri K and Hu H. 2023. Gut microbiota composition in patients with neurodegenerative disorders (Parkinson’s and Alzheimer’s) and healthy controls: a systematic review[J]. Nutrients, 15(20):4365 [DOI: 10.3390/nu15204365http://dx.doi.org/10.3390/nu15204365]
Livingston G, Huntley J, Sommerlad A, Ames D, Ballard C, Banerjee S, Brayne C, Burns A, Cohen-Mansfield J, Cooper C, Costafreda S, Dias A, Fox N, Gitlin L, Howard R, Kales H, Kivimäki M, Larson E, Ogunniyi A, Orgeta V, Ritchie K, Rockwood K, Sampson E, Samus Q, Schneider L, Selbæk G, Teri L and Mukadam N. 2020. Dementia prevention, intervention, and care: report of the Lancet Commission. The Lancet, 396(10248):413-446[DOI:10.1016/S0140-6736(20)30367-6http://dx.doi.org/10.1016/S0140-6736(20)30367-6]
Liu Z, Lin Y T, Cao Y, Hu H, Wei Y X, Zhang Z, Lin S and Guo B N.2021. Swin Transformer: hierarchical vision Transformer using shifted windows//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada:IEEE:9992-10002[DOI:10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986]
Lotter W, Diab A R, Haslam B, Kim J G, Grisot G, Wu E, Wu K, Onieva J O, Boyer Y, Boxerman J L, Wang M, Bandler M, Vijayaraghavan G R and Sorensen A G. 2021. Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach. Nature Medicine, 27(2):244-249[DOI:10.1038/s4159 1-020-01174-9http://dx.doi.org/10.1038/s41591-020-01174-9]
Meng Q Q, Gao Y, Lin H, Wang T J, Zhang Y R, Feng J, Li Z S, Xin L and Wang L W. 2022. Application of an artificial intelligence system for endoscopic diagnosis of superficial esophageal squamous cell carcinoma. World Journal of Gastroenterology, 28(37):5483[DOI:10.3748/wjg.v28.i37.5483http://dx.doi.org/10.3748/wjg.v28.i37.5483]
Mohamed C, Peter Z, Ross B and Stephen H. 2021. Deft: Detection embeddings for tracking[EB/OL]. https://arxiv.org/abs/2102.02 267.pdfhttps://arxiv.org/abs/2102.02267.pdf
Monaco D, Berg D, Thomas A, Di S V., Barbone F, Vitale M, Ferrante C, Bonanni L, Nicola M D, Garzarella T, Marchionno L P, Malferrari G, Mascio R D, Onofrj M and Franciotti R. 2018. The predictive power of transcranial sonography in movement disorders: a longitudinal cohort study. Neurological Sciences, 39:1887-1894[DOI:10.1007/s10072-018-3514-zhttp://dx.doi.org/10.1007/s10072-018-3514-z]
Özbay E and Özbay F A. 2023. Interpretable features fusion with precision MRI images deep hashing for brain tumor detection. Computer Methods and Programs in Biomedicine, 231:107387 [DOI:10.1016/j.cmpb.2023.107387http://dx.doi.org/10.1016/j.cmpb.2023.107387]
Redmon J and Farhadi A. 2018. Yolov3: An incremental improvement [EB/OL].[2018-04-08]. https://arxiv.org/pdf/1804.02767.pdfhttps://arxiv.org/pdf/1804.02767.pdf
Tarimo S A, Jang M A, Ngasa E E, Shin H B, Shin H J and Woo J. 2022. WBC YOLO-ViT: 2 Way-2 stage white blood cell detection and classification with a combination of YOLOv5 and vision transformer. Computers in Biology and Medicine, 2023: 107875[DOI: 10.1016/j.compbiomed.2023.107875http://dx.doi.org/10.1016/j.compbiomed.2023.107875]
Vlaar A, Tromp S C, Weber W E, Hustinx, R M and Mess W H. 2011. The reliability of transcranial duplex scanning in parkinsonian patients: comparison of different observers and ultrasound systems. Ultraschall in der Medizin-European Journal of Ultrasound, 32(S 01):83-88[DOI:10.1055/s-0028-1109945http://dx.doi.org/10.1055/s-0028-1109945]
Wang C Y, Bochkovskiy A and Liao H Y M. 2023, YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [2023-]. https:// arxiv.org/abs/2207.02696.pdfhttps://arxiv.org/abs/2207.02696.pdf
Wang W, Xie E, Li X, Fan D, Song K, Liang D, Lu T, Luo P and Shao L. 2022. Pvt v2: Improved baselines with pyramid vision transformer[J]. Computational Visual Media, 8(3): 415-424 [DOI:10.1007/s41095-022-0274-8http://dx.doi.org/10.1007/s41095-022-0274-8]
Yang Y Z, Yuan Y, Zhang G, Wang H, Chen Y C, Liu, Y C, Tarolli C G, Crepeau D, Bukartyk J, Junna M R, Videnovic A, Ellis T D, Lipford M C, Dorsey R and Katabi D . 2022. Artificial intelligence-enabled detection and assessment of Parkinson’s disease using nocturnal breathing signals. Nature medicine, 28(10): 2207-2215[DOI:10.1038/s41591-022-01932-xhttp://dx.doi.org/10.1038/s41591-022-01932-x]
Yin H, Vahdat A, Alvarez J M, Mallya A, Kautz J and Molchanov P. 2022. A-vit: Adaptive tokens for efficient vision transform-er[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE: 10809-10818[DOI:10.48550/arXiv.2112.07658http://dx.doi.org/10.48550/arXiv.2112.07658]
Yu D, Peng Y J, Guo Y F. 2023. Ultrasonic image segmentation of thyroid nodules-relevant multi-scale feature based h-shape network. Journal of Image and Graphics, 28(07):2195-2207
于典, 彭延军, 郭燕飞. 2023. 面向甲状腺结节超声图像分割的多尺度特征融合“h”形网络.中国图象图形学报,28(07): 2195-2207 [DOI:10. 11834/jig. 220078http://dx.doi.org/10.11834/jig.220078]
Zhou X, Bai Z, Lu Q and Fan S. 2023. Colorectal Polyp Segmentation Combining Pyramid Vision Transformer and Axial Attention. Computer Engineering and Applications, 59(11): 222-230
周雪, 柏正尧, 陆倩杰, 樊圣澜. 2023. 融合视觉Transformer和轴向注意的结直肠息肉分割[J]. 计算机工程与应用, 59(11):222-230 [DOI:10.3778/j.issn.1002-8331.2203- 0110http://dx.doi.org/10.3778/j.issn.1002-8331.2203-0110]
Zhou T, Liu S, Dong Y L, Bai J and Lu H L. 2023. Parallel decomposition adaptive fusion model: cross-modal image fusion of lung tumors. Journal of Image and Graphics, 28(01): 0221-0233
周涛, 刘珊, 董雅丽, 白静, 陆惠玲. 2023. 肺部肿瘤跨模态图像融合的并行分解自适应融合模型. 中国图象图形学报, 28(01):0221-0233 [DOI:10.11834/jig.210988http://dx.doi.org/10.11834/jig.210988]
相关作者
相关机构