融合帧间时序关系的标准胎儿四腔心超声切面自动获取

徐光柱; 吴梦琦; 钱奕凡; 王阳; 刘蓉; 周军; 雷帮军

doi:10.11834/jig.230251

图像理解和计算机视觉 | 浏览量 : 0 下载量: 6 CSCD: 0

PDF
导出
分享
收藏
专辑

融合帧间时序关系的标准胎儿四腔心超声切面自动获取
Automatic capture for standard fetal cardiac four-chamber ultrasound view by fusing frame sequential relationships
2024年29卷第3期页码：782-797
纸质出版日期： 2024-03-16 ，
DOI： 10.11834/jig.230251
稿件说明：

移动端阅览

徐光柱，吴梦琦，钱奕凡，王阳，刘蓉，周军，雷帮军. 2024. 融合帧间时序关系的标准胎儿四腔心超声切面自动获取. 中国图象图形学报， 29(03):0782-0797

Xu Guangzhu， Wu Mengqi， Qian Yifan， Wang Yang， Liu Rong， Zhou Jun， Lei Bangjun. 2024. Automatic capture for standard fetal cardiac four-chamber ultrasound view by fusing frame sequential relationships. Journal of Image and Graphics， 29(03):0782-0797
徐光柱，吴梦琦，钱奕凡，王阳，刘蓉，周军，雷帮军. 2024. 融合帧间时序关系的标准胎儿四腔心超声切面自动获取. 中国图象图形学报， 29(03):0782-0797 DOI： 10.11834/jig.230251.

Xu Guangzhu， Wu Mengqi， Qian Yifan， Wang Yang， Liu Rong， Zhou Jun， Lei Bangjun. 2024. Automatic capture for standard fetal cardiac four-chamber ultrasound view by fusing frame sequential relationships. Journal of Image and Graphics， 29(03):0782-0797 DOI： 10.11834/jig.230251.

摘要

目的

超声医师手动探查与采集胎儿心脏切面图像时，常因频繁的手动暂停与截图操作而错失心脏切面最佳获取时机。而单纯采用深层视觉目标检测或分类网络自动获取切面时，因无法确保网络重点关注切面图像中相对较小的心脏区域的细粒度特征，导致高误检率；另外，不同的心脏解剖部件的最佳成像时刻也常常不同步。针对上述问题，提出一种目标检测与分类网络相结合，同时融合关键帧间时序关系的标准四腔心（four-chamber，4CH）切面图像自动获取算法。

方法

首先，利用自行构建的胎儿心脏超声切面数据集训练目标检测网络，实现四腔心区域和降主动脉区域的快速准确定位。接着，当检测到在一定时间窗内的视频帧存在降主动脉区域时，将包含四腔心目标的候选区域提取后送入利用自建的标准四腔心区域图像集训练好的分类网络，进一步分类出标准四腔心区域。最后，通过时序关系确定出可靠的降主动脉区域，将可靠降主动脉的检测置信度及同一时间窗内各个切面图像中四腔心区域在分类模型中的输出，加权计算得到标准四腔心切面图像的得分。

结果

采用本文构建的数据集训练的YOLOv5x（you only look once version 5 extra large）和Darknet53模型，在四腔心区域和降主动脉区域的检测任务上分别达到94.0%的mAP@0.5和61.1%的mAP@［.5∶.95］，以及69.5%的recall@0.5-0.95；在四腔心区域标准性分类任务上TOP-1准确率达到92.4%。将检测与分类模块结合后，系统对四腔心区域的误检率降低了29.38%。

结论

目标检测与分类网络相结合的策略及帧间时序信息的加入能够有效调和错检与漏检间的矛盾，同时大幅降低误检率。另外，所提算法除可自动获取标准的四腔心切面图像外，还可同时给出最佳切面，具有较好的实际应用价值。

Abstract

Objective

First-rank scan planes usually cannot be easily captured well because of the frequent pause and screenshot operations and fetal random movements when ultrasound sonographers manually scan a fetal heart region. This limitation discourages efficient screenings. When deep neural networks designed for visual object detection or classification are adapted for automatically capturing fetal cardiac ultrasound scan planes， these networks usually end up with a high false detection rate. One possible reason is that they cannot ensure focusing on the fine-grained features within the relatively small cardiac region. Moreover， optimal scanning moments for different cardiac parts are usually asynchronous， in which case object detection networks tend to miss numerous potential scan planes if they rely on counting coexisting cardiac parts at a moment. To solve the preceding problems， our study focuses on the most critical fetal cardiac ultrasound scan plane， namely four-chamber（4CH） scan plane， and proposes an automatic four-chamber scan plane extraction algorithm by simultaneously combining object detection and classification networks and considering the relationships of key video frames.

Method

To solve the problem emanating from the lack of public datasets of the four-chamber fetal echocardiographic image， 512 echocardiographic videos of 14- to 28-week-old fetuses were collected from our partners. Each video was recorded by experienced sonographers with mainstream ultrasound equipments. Most of these videos consist of continuous scan views from the gastric vesicle to the heart and to the three vessels thereafter. When labeling the standard four-chamber planes， to ensure that the detection model learns considerable information on the standard four-chamber scan plane， the standard four-chamber plane dataset used in subsequent experiments was manually screened from the image frames of video Nos. 1–100 and Nos. 144–512 to ensure each image has positive sample targets. In addition， the four-chamber heart region and the descending aorta（DAO） region in each image were labeled. Thereafter， these standard four-chamber scan planes were divided into training， verification， and test sets according to the ratio of 5∶2∶3. They were used for subsequent training and evaluation of the detection model on the standard four-chamber scan plane image set. During the training of the detection and classification models， the YOLOv5x network was first trained with the marked four-chamber scan plane image dataset. Thereafter， the trained detection model was used to evaluate the previously unmarked video frames （regarded as non-standard four-chamber planes） under the appropriate threshold setting. The false detected images were extracted as the negative dataset for the following classification model’s training. Lastly， the four-chamber regions were extracted to train the Darkent53 classification model according to the position coordinates of manually labeled （as standard） and mistakenly detected by YOLOv5x （as non-standard） four-chamber regions. During the reasoning process， the trained detection model was first used to achieve rapid and accurate locating of the four-chamber and descending aorta regions. Thereafter， when a descending aorta region was detected in a video frame within a certain time window， the candidate regions containing the four-chamber objects were extracted and sent to the classification model， which is well-trained with the self-built qualified four-chamber region dataset to further classify the qualified four-chamber regions. Lastly， the reliable descending aorta region was determined through the time series relationship. The score of a standard four-chamber scan plane was calculated by a weighted sum of the detection confidence of the reliable descending aorta and the quality metrics of the four-chamber regions of those frames in the same time window.

Result

Given that there are several standard four-chamber scan planes in any fetal cardiac ultrasound video and that this research mainly studies the optimal automatic extraction of the standard four-chamber scan planes， we focus considerably on the false detection rate when analyzing the performance of the YOLOv5x （for detection） and Darknet53 （for classification） modules before and after their combination. The objective is to achieve a relatively low false detection rate while ensuring low false detection rate. Experimental results show that with the detection confidence threshold increasing （0.3–0.9）， the false detection rate of YOLOv5x gradually decreases （from 36.25% to 11.20%）， but the missed detection rate continuously increases （from 0.31% to 27.17%）. This result indicates the difficulty of ensuring a low false detection rate by merely adjusting the detection confidence of YOLOv5x. With the confidence threshold increasing， the missed detection rate of YOLOv5x is also increasing. Therefore， determining whether there is a standard four-chamber heart region in each frame is not possible by simply adjusting the detection confidence threshold of YOLOv5x. When the detection confidence threshold is set to 0.3 and the Darknet53 classification module is added， although the system’s missed detection rate increases by 19.72%， the false detection rate decreases by 35.18%. When the detection confidence threshold is 0.4–0.6， after the Darknet53 classification module is combined， although there are still a few missed detections for the entire system， its false detection rate is significantly reduced compared with the case when only the YOLOv5x detection module is used. Moreover， when the confidence threshold is 0.5， the overall system’s error rate reaches the lowest level， with an error rate of 21.06%， and the false detection rate decreases from 30.25% to 0.87% （a decrease of 29.38%）. When the detection confidence level is 0.7–0.9， combining the Darknet53 classification module can further reduce the false detection rate of the system， but the missed detection rate will increase with the increase of confidence level （from 20.96% to 40.22%）. Therefore， to ensure a low false detection rate and obtain a low missed rate， the setting of the confidence threshold as 0.5 and intersection ratio as 0.5 are adopted in this study. Although the experimental data show that the missed rate is nearly 21% in the best situation， the false detection rate is the key index for the practical problems faced in this research. Through the proposed algorithm， the false detection rate can be reduced to under 1%. In real application scenarios， the effective four-chamber video frame will often appear multiple times； in this case， the low false detection rate and high missed rate can meet the actual needs.

Conclusion

Experimental results show that the combination of the target detection and classification networks， combined with the inter-frame sequential information， can effectively reconcile the contradiction between error detection and missed detection and significantly reduce false detection rate. Lastly， the proposed algorithm can automatically extract the standard four-chamber plan and also recommend the best one， which has good practical application value.

关键词

深度学习卷积神经网络（CNN）目标检测图像分类帧间时序关系

Keywords

deep learningconvolutional neural network（CNN）object detectionimage classificationframe sequential relationships

references

Abdi A H， Luong C， Tsang T， Allan G， Nouranian S， Jue J， Hawley D， Fleming S， Gin K， Swift J， Rohling R and Abolmaesumi P. 2017. Automatic quality assessment of echocardiograms using convolutional neural networks： feasibility on the apical four-chamber view. IEEE Transactions on Medical Imaging， 36（6）： 1221-1230 ［DOI： 10.1109/TMI.2017.2690836http://dx.doi.org/10.1109/TMI.2017.2690836］

Baumgartner C F， Kamnitsas K， Matthew J， Fletcher T P， Smith S， Koch L M， Kainz B and Rueckert D. 2017. SonoNet： real-time detection and localisation of fetal standard scan planes in freehand ultrasound. IEEE Transactions on Medical Imaging， 36（11）： 2204-2215 ［DOI： 10.1109/TMI.2017.2712367http://dx.doi.org/10.1109/TMI.2017.2712367］

Chen K， Wang J Q， Pang J M， Cao Y H， Xiong Y， Li X X， Sun S Y， Feng W S， Liu Z W， Xu J R， Zhang Z， Cheng D Z， Zhu C C， Cheng T H， Zhao Q J， Li B Y， Lu X， Zhu R， Wu Y， Dai J F， Wang J D， Shi J P， Ouyang W L， Loy C C and Lin D H. 2019. MMDetection： open MMLab detection toolbox and benchmark ［EB/OL］. ［2023-05-28］. https://arxiv.org/pdf/1906.07155.pdfhttps://arxiv.org/pdf/1906.07155.pdf

Del Bianco A， Russo S， Lacerenza N， Rinaldi M， Rinaldi G， Nappi L and Greco P. 2006. Four chamber view plus three-vessel and trachea view for a complete evaluation of the fetal heart during the second trimester. Journal of Perinatal Medicine， 34（4）： 309-312 ［DOI： 10.1515/JPM.2006.059http://dx.doi.org/10.1515/JPM.2006.059］

Dudley N J and Chapman E. 2002. The importance of quality management in fetal measurement. Ultrasound in Obstetrics and Gynecology， 19（2）： 190-196 ［DOI： 10.1046/j.0960-7692.2001.00549.xhttp://dx.doi.org/10.1046/j.0960-7692.2001.00549.x］

Gao S H， Cheng M M， Zhao K， Zhang X Y， Yang M H and Torr P. 2021. Res2Net： a new multi-scale backbone architecture ［EB/OL］. ［2023-05-28］. https://arxiv.org/pdf/1904.01169v1.pdfhttps://arxiv.org/pdf/1904.01169v1.pdf

Ge Q Q， Zhang Z J， Yuan L， Li X M and Sun J M. 2021. Safety helmet wearing detection method of fusing environmental features and improved YOLOv4. Journal of Image and Graphics， 26（12）： 2904-2917

葛青青，张智杰，袁珑，李秀梅，孙军梅. 2021. 融合环境特征与改进YOLOv4的安全帽佩戴检测. 中国图象图形学报， 26（12）： 2904-2917 ［DOI： 10.11834/jig.200606http://dx.doi.org/10.11834/jig.200606］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Jeanty P， Chaoui R， Tihonenko I and Grochal F. 2007. A review of findings in fetal cardiac section drawings： Part 1： the 4-chamber view. Journal of Ultrasound in Medicine， 26（11）： 1601-1610 ［DOI： 10.7863/jum.2007.26.11.1601http://dx.doi.org/10.7863/jum.2007.26.11.1601］

Jiang J H. 2019. Automatic Classification and Parameter Measurement of Echocardiogram Using Deep Learning. Nanjing： Southeast University

蒋建慧. 2019. 基于深度学习的超声心动图自动分类与参数测量研究. 南京：东南大学

Komatsu M， Sakai A， Komatsu R， Matsuoka R， Yasutomi S， Shozu K， Dozen A， Machino H， Hidaka H， Arakaki T， Asada K， Kaneko S， Sekizawa A and Hamamoto R. 2021. Detection of cardiac structural abnormalities in fetal ultrasound videos using deep learning. Applied Sciences， 11（1）： #371 ［DOI： 10.3390/app11010371http://dx.doi.org/10.3390/app11010371］

Kong P Y. 2020. Research on the Application of Deep Learning in Prenatal Ultrasound Standard Views Classification， Visualization and Temporal Learning. Shenzhen： Shenzhen University

孔沛瑶. 2020. 深度学习在产前超声标准面分类、可视化和时序学习中的应用研究. 深圳：深圳大学

Lange L W， Sahn D J， Allen H D， Goldberg S J， Anderson C and Giles H. 1980. Qualitative real-time cross-sectional echocardiographic imaging of the human fetus during the second half of pregnancy. Circulation， 62（4）： 799-806 ［DOI： 10.1161/01.CIR.62.4.799http://dx.doi.org/10.1161/01.CIR.62.4.799］

Li H Q. 2021. Fine-grained Fetal Ultrasound Scan Planes Selection Based on Multi-label Joint Learning. Changsha： Hunan University

李红旗. 2021. 基于多标签联合学习的细粒度胎儿超声标准切面识别. 长沙：湖南大学

Lin T Y， Goyal P， Girshick R， He K M and Doll􀅡r P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 2999-3007 ［DOI： 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324］

Liu S F， Wang Y， Yang X， Lei B Y， Liu L， Li S X， Ni D and Wang T F. 2019. Deep learning in medical ultrasound analysis： a review. Engineering， 5（2）： 261-275 ［DOI： 10.1016/j.eng.2018.11.020http://dx.doi.org/10.1016/j.eng.2018.11.020］

MMClassification Contributors. （2020）. OpenMMLab's Image Classification Toolbox and Benchmark ［EB/OL］. ［2023-05-28］. https://github.com/open-mmlab/mmclassificationhttps://github.com/open-mmlab/mmclassification

Rahmatullah B， Sarris I， Papageorghiou A and Noble J A. 2011. Quality control of fetal ultrasound images： detection of abdomen anatomical landmarks using AdaBoost//Proceedings of 2011 IEEE International Symposium on Biomedical Imaging： From Nano to Macro. Chicago， USA： IEEE： 6-9 ［DOI： 10.1109/ISBI.2011.5872342http://dx.doi.org/10.1109/ISBI.2011.5872342］

Redmon J and Farhadi A. 2018. YOLOv3： an incremental Improvement ［EB/OL］. ［2023-05-28］. https://arxiv.org/pdf/1804.02767.pdfhttps://arxiv.org/pdf/1804.02767.pdf

Ren S Q， He K M， Girshick R and Sun J. 2017. Faster R-CNN： towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（6）： 1137-1149 ［DOI： 10.1109/TPAMI.2016.2577031http://dx.doi.org/10.1109/TPAMI.2016.2577031］

Salomon L J， Bernard J P， Duyme M， Doris B， Mas N and Ville Y. 2006. Feasibility and reproducibility of an image-scoring method for quality control of fetal biometry in the second trimester. Ultrasound in Obstetrics and Gynecology， 27（1）： 34-40 ［DOI： 10.1002/uog.2665http://dx.doi.org/10.1002/uog.2665］

Salomon L J and Ville Y. 2005. Quality control of prenatal ultrasound. The Ultrasound Review of Obstetrics and Gynecology， 5（4）： 297-303 ［DOI： 10.1080/14722240500415419http://dx.doi.org/10.1080/14722240500415419］

Jocher ， G. （2020）. YOLOv5 by Ultralytics （Version 6.2）［EB/OL］. ［2023-05-28］. https://github.com/ultralytics/yolov5https://github.com/ultralytics/yolov5

Jocher ， G.， Chaurasia ， A.， & Qiu ， J. （2023）. Ultralytics YOLO （Version 8.0.0）［EB/OL］. ［2023-05-28］. https：//github.com/ultralytics/ultralyticshttps://github.com/ultralytics/ultralytics

Vullings R. 2019. Fetal electrocardiography and deep learning for prenatal detection of congenital heart disease//Proceedings of 2019 Computing in Cardiology. Singapore， Singapore： IEEE： 1-4 ［DOI： 10.22489/CinC.2019.072http://dx.doi.org/10.22489/CinC.2019.072］

Wang C Y， Bochkovskiy A and Mark Liao H Y. 2022. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors ［EB/OL］. ［2023-05-28］. https://arxiv.org/pdf/2207.02696.pdfhttps://arxiv.org/pdf/2207.02696.pdf

Wang Y， Ge X K， Ma H， Qi S L， Zhang G J and Yao Y D. 2021. Deep learning in medical ultrasound image analysis： a review. IEEE Access， 9： 54310-54324 ［DOI： 10.1109/ACCESS.2021.3071301http://dx.doi.org/10.1109/ACCESS.2021.3071301］

Yagel S， Cohen S M and Achiron R. 2001. Examination of the fetal heart by five short-axis views： a proposed screening method for comprehensive cardiac evaluation. Ultrasound in Obstetrics and Gynecology， 17（5）： 367-369 ［DOI： 10.1046/j.1469-0705.2001.00414.xhttp://dx.doi.org/10.1046/j.1469-0705.2001.00414.x］

Yan A M. 2020. Fetal Cardiac Cycle Extraction Algorithm Based on Ultrasound Video. Changsha： Hunan University

闫安民. 2020. 基于超声视频的胎儿心动周期提取算法. 长沙：湖南大学

Yi J， Kang H K， Kwon J H， Kim K S， Park M H， Seong Y K， Kim D W， Ahn B， Ha K， Lee J， Hah Z and Bang W C. 2021. Technology trends and applications of deep learning in ultrasonography： image quality enhancement， diagnostic support， and improving workflow efficiency. Ultrasonography， 40（1）： 7-22 ［DOI： 10.14366/usg.20102http://dx.doi.org/10.14366/usg.20102］

Zhang H， Wu C R， Zhang Z Y， Zhu Y， Lin H B， Zhang Z， Sun Y， He T， Mueller J， Manmatha R， Li M and Smola A. 2020. ResNeSt： split-attention networks ［EB/OL］. ［2023-05-28］. https://arxiv.org/pdf/2004.08955.pdfhttps://arxiv.org/pdf/2004.08955.pdf

文章被引用时，请邮件提醒。

提交

高层语义分析中的模型蒸馏方法综述

智能交通系统中的车辆标志识别方法综述

图像去模糊研究综述

TransAS-UNet:融合Swin Transformer和UNet的乳腺癌区域分割

结合关键点与引导向量的旋转目标检测网络