最新刊期

    29 3 2024

      Review

    • Chen Jian,Guang Mengting,Lu Ranlin,Luo Qin,Wei Lifang,Shen Dinggang
      Vol. 29, Issue 3, Pages: 561-585(2024) DOI: 10.11834/jig.230321
      Research progress on fetal brain magnetic resonance image segmentation
      摘要:Medical imaging is an important tool for prenatal screening, diagnosis, treatment guidance, and evaluation, which can effectively avoid abnormal development of the fetal central nervous system, especially for the fetal brain. Medical imaging is mainly operated through X-ray, ultrasound, computed tomography (CT), magnetic resonance imaging (MRI), and other technologies. MRI is a typical non-invasive imaging technology and has become increasingly important for prenatal diagnosis in recent years. MR images are able to produce high tissue contrast, spatial resolution and comprehensive information to facilitate the diagnosis and treatment of the diseases. The automatic measurement of fetal brain MR images can realize the quantitative evaluation of fetal brain development, resulting in improved efficiency and accuracy of diagnoses. The realization of automatic, quantitative, and accurate analysis of fetal brain MR images depends on reliable image segmentation. Therefore, fetal brain MR image segmentation is of vital clinical significance and research value. Owing to the multiple tissues and organs around the fetal brain, poor image quality, and rapid brain structure changes, fetal brain MR image segmentation encounters numerous challenges. This paper summarizes fetal brain MR image segmentation methods. First, the main public atlases and data sets of fetal brain MR images are introduced in detail, including seven publicly available fetal brain MR atlases/data sets. The segmentation labels, acquisition parameters, and some other information of atlases/data sets are described. The links for the atlases/data sets are provided. Second, image segmentation methods, such as brain extraction and tissue/lesion segmentation, are classified and analyzed. Brain extraction methods are categorized into thresholding, region growing, atlas fusion, and classification techniques. Classification techniques are subdivided into traditional machine learning- and deep learning-based methods. Deep learning can automatically learn deep and discriminative features from data, and the performance is significantly improved compared with other methods. Most deep learning-based methods are based on U-Net. And the multi-stage extraction framework is commonly used. Therefore, we further subdivide the deep learning-based methods into single- and multi-stage strategies with U-Net and other convolutional neural networks. Tissue/lesion segmentation methods are categorized into atlas fusion and classification techniques. For the tissue segmentation methods, classification techniques are subdivided into traditional machine learning- and deep learning-based methods. Similarly, we further subdivide the deep learning-based methods with U-Net, other convolutional neural networks, and Transformer-based neural networks. In each subsection, we compare the performance of different methods. Lastly, by analyzing the methods, the challenges and future research directions of fetal brain MR image segmentation are summarized and prospected. The conclusions are as follows. 1) One main issue of fetal brain image analysis is that there are only a few publicly available data sets, and the sample size of the available data sets is small. Accordingly, evaluating performance uniformly with private data sets is common. At present, there are only three publicly available data sets, and images for fetal brain lesions are limited. Moreover, some data sets were not annotated. Therefore, lack of annotated data is an issue that seriously restricts the extensive and in-depth application of deep learning-based methods. 2) Data in the existing data sets are insufficient to support clinical application. At present, most existing fetal brain atlases or datasets are collected from 1.5 T devices. These data have undergone numerous preprocessing operations to derive high-resolution MR images. However, in clinical applications, most images are still the original ones from 1.5 T devices. In addition, image quality and resolution are significantly different among public atlases/data sets. Consequently, the segmentation models trained via available atlases/data sets cannot be directly applied to clinical images obtained from MRI devices. 3) Although existing deep learning-based methods have been leveraged to fetal brain image segmentation, most references only apply existing deep learning-based methods without sufficient innovation and consideration of the anatomy and image characteristics of fetal brains. 4) The current performance of deep learning-based methods is still unsatisfactory, while the low accuracy of fetal brain extraction and tissue/lesion segmentation will affect the final diagnosis. 5) Most methods do not consider image degradation issues, such as motion artifacts and blurred boundary. 6) Variability in clinical imaging equipment, parameters, and other factors leads to significant differences in fetal brain magnetic resonance images, resulting in limited generalization ability of current segmentation methods. Several possible research directions are presented. 1) Extensive research on automatic annotation is highly required, especially for deep learning-based methods. 2) Additional information on fetal brain, such as anatomical structure and image characteristics, can be further studied. 3) Deep learning-based methods, such as weak supervised, transfer, and self-supervised learning, can be used to compensate for data sets with minimal ground truth or without ground truth. 4) Researchers could combine segmentation methods with super-resolution, motion correction, or design deep networks based on low-quality images to address practical problems of fetal brain MR images. 5) Researchers could enhance the generalization ability of models by combining transfer learning, data augmentation, and training with multimodal data.  
      关键词:fetal brain;magnetic resonance imaging(MRI);dataset;image segmentation;deep learning   
      6
      |
      1
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216305 false
      发布时间:2024-03-13
    • Zhao Yang,Li Juncheng,Cheng Bodong,Niu Najun,Wang Longguang,Gao Guangwei,Shi Jun
      Vol. 29, Issue 3, Pages: 586-607(2024) DOI: 10.11834/jig.230062
      Applications and challenges of deep learning in dental imaging
      摘要:Dental imaging is an essential tool for the detection, screening, diagnosis, and therapeutic evaluation of clinical oral diseases, and the accurate analysis of the images is vital to the development of subsequent treatment plans. Deep learning, which is widely used in many fields, such as machine translation, speech recognition, and computer vision, can automatically learn and obtain superior feature expressions from large sample data, thereby improving the efficiency and performance of various machine learning tasks. With the integration of artificial intelligence and various fields, smart healthcare has become an important application area of deep learning, providing an effective way of solving the following clinical problems: 1) the shortage of experienced radiologists in the field of dentistry cannot meet the rapidly growing medical demand. 2) Despite sufficient medical resources, the number of experienced physicians cannot meet the rapidly growing medical demand. 3) Different physicians have different interpretations of the same oral image, which are influenced by subjectivity. Deep learning-based dental image processing is currently a popular research topic. The inherent specificity and complexity of the medical field and the problem of insufficient dental image data samples bring new challenges to the application of deep learning methods in relevant learning tasks and scenarios. This work mainly reviews the various applications of deep learning methods using three major dental imaging methods (i.e., two-dimensional oral X-ray images, three-dimensional tooth point cloud/mesh images, cone beam computed tomography (CBCT)). These applications include tooth segmentation, caries detection, and tumor detection. The reviews on two-dimensional oral X-ray images focus on bitewing, periapical, and panoramic X-rays based on deep learning methods. Bitewing X-rays usually show the contact surface from the distal end of the canine to the most distal molar and are mainly used to diagnose proximal caries, assess the extent of caries, and identify secondary caries under existing restorations. For caries detection using bitewing X-rays, we mainly review deep learning methods using convolutional neural network architectures, such as full convolutional neural networks and U-Net architectures. Periodontitis and caries detection using periapical X-rays primarily introduces methods based on convolutional neural networks and backward propagation neural network. In contrast to bitewing and periapical X-rays, panoramic X-rays show not only teeth and gums but also the jaw, the skull, the spine, and other bones. We provide a focused review of the application of deep learning methods in panoramic radiographic images from three detection categories of directions: tooth detection and numbering, tooth segmentation, and non-dental disease detection. The three-dimensional tooth point cloud/mesh image is a digital 3D oral model obtained by scanning and reconstructing the patient’s mouth in real time using an intraoral scanner. The reviews on the three-dimensional tooth point cloud/mesh image focus on tooth segmentation based on deep learning. The deep learning methods for tooth segmentation can be divided into two categories: fully supervised and non-full supervision methods. For fully supervised methods, we mainly introduce the hierarchical and end-to-end network architecture models with a large amount of labeled data and annotation. For non-full supervision methods, we primarily review self- and semi-supervised learning methods that only require partially annotated data and methods that utilize weakly annotated ideas. Currently, CBCT imaging, which is a non-invasive, low-radiation technique, is widely used in dental diagnosis and treatment. This paper summarizes the deep learning methods for CBCT images, focusing on three major areas: tooth segmentation, dental implants, and oral and maxillofacial surgery. Nevertheless, analyzing all aspects of the patient’s oral and systemic health status to develop a personalized and high-level treatment plan for oral diseases using the deep learning methods that have been proposed is still difficult. Deep learning has made some progress in the field of dental image processing but still faces some serious challenges. The small sample size has been a serious problem in the field of medical image analysis but can be effectively addressed using non-fully supervised deep learning methods, such as weakly supervised learning and self-supervised learning, with machine learning methods, including migration learning, sample less learning, and incremental learning. In addition, the annotation of dental medical images is a time-consuming and laborious task that relies heavily on the experience of the practitioner, which is one of the barriers limiting the extensive and intensive application of deep learning. Therefore, automatic data annotation must be extensively studied. The development of deep learning applications in dental imaging is still in a relatively early stage. The development of this field cannot be achieved without the cooperation of computer scientists, clinicians, and experts in imaging equipment and software development in solve the problem of deploying lightweight deep learning networks in convenient medical devices. Therefore, the combination of deep learning and dental image analysis has been a major trend, with significant results in various analysis tasks, but require further research to lead the development of dental image analysis to a new phase.  
      关键词:deep learning;dental imaging;tooth detection and segmentation;caries detection;computer-aided diagnosis(CAD)   
      7
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216669 false
      发布时间:2024-03-13

      Dataset

    • Wang Huadeng,Wang Xuexin,Li Bingbing,Liu Zhipeng,Xu Hao,Pan Xipeng,Lan Rushi,Luo Xiaonan
      Vol. 29, Issue 3, Pages: 608-619(2024) DOI: 10.11834/jig.230332
      GZMH: a dataset of breast cancer pathological images for mitosis nuclei detection and segmentation
      摘要:ObjectiveMitosis nuclei count is one of the three important scoring indexes in the diagnosis and histological grading of breast cancer because this index is used to evaluate the aggressiveness of tumors and to provide markedly comprehensive and reliable information for accurate diagnosis and treatment. In current clinical practice, hematoxylin and eosin staining (H&E) staining is mostly used for pathological sections. Histopathological images stained with H&E can intuitively display cell components and tissue structures. For deep learning-based automated mitosis detection studies, pathologists need to manually label observed mitotic cells at high power field (HPF), which is an extremely tedious and time-consuming task requiring extensive experience and professional equipment. However, computer-assisted automatic detection, especially the introduction of deep learning methods, has attracted increasing attention from researchers in recent years because it helps reduce doctors’ workload and improve diagnostic efficiency. Multiple competitions (e.g., ICPR Mitosis Detection Challenge in 2012, AMIDA13 competition at MICCAI 2013) have been held internationally to study the specific application of deep learning methods in the mitosis detection of breast cancer. These competitions have attracted many researchers to participate, and numerous excellent methods based on these datasets have emerged. However, most public datasets in the current research are selected by organizers and data providers, which are relatively different from data used in a clinical environment and not conducive to the test and verification of model performance and generalization ability. Given the preceding problems, this research published a GZMH dataset from the clinical environment of Ganzhou Municipal Hospital in China.MethodThe published GZMH dataset contains 55 clinical breast cancer pathological images of whole slide images (WSIs), which provides two types of annotations for mitosis nuclei target detection and semantic segmentation research. Moreover, annotations from three primary pathologists are checked by two senior doctors. The GZMH dataset contains 1 534 RGB channel electronic images with a resolution of 2 084 × 2 084 pixels and 2 355 mitotic regions. First, the dataset selects 55 WSIs from 109 finely labeled WSIs as the original data of GZMH. Second, the dataset uses sliding window to cut the corresponding area’s HPF in the XML file on WSI; HPF is cut only once when the center of the circumscribed rectangle of the nucleus is within the current HPF range. To avoid numerous nuclear fragments, we only keep the grid where the center of the circumscribed rectangle of the nucleus is located. After the preceding data processing, mitosis nuclei is labeled, in which the pixel level is labeled as a black-and-white binary label, and the target detection label is the minimum circumscribed rectangular coordinates and centroid coordinates of the nuclear fission image area. Eventually, a large-scale dataset is formed. Five mainstream object detection methods and five classical segmentation methods are trained and tested on the GZMH dataset to assess their performance on the GZMH dataset.ResultThis study uses five mainstream object detection models (i.e., Faster RCNN, FSAF, RetinaNet, YOLOv3, and SSD) and five classical segmentation models (i.e., U-Net, SegNet, R2U-Net, LinkNet34, and DeepLabV3+) to organize the experiments. In the comparison of experimental results of the object detection methods, the SSD model achieved the best performance, and the F1-score achieved 0.511. In the comparison of experimental results of the segmentation methods, R2U-Net achieved the best performance, and the F1-score is 0.430. The performance of all methods in terms of the large-scale GZMH clinical dataset is evidently lower than their performance results on some public datasets.ConclusionWe published a dataset for mitotic nuclei detection, which is characterized by numerous case data and rich types, and the data characteristics approximate the actual application scenarios. In addition, the problems of memory bottleneck and nuclear fragmentation are solved through data processing. We evaluate 10 representative methods of target detection and semantic segmentation on this new dataset and review the challenging problems of various algorithms. The proposed GZMH dataset can meet the research tasks of mitosis nuclei detection and semantic segmentation. Moreover, images in this dataset approximate the actual application scenarios, which invaluable in promoting the research progress and clinical application of mitosis nuclei segmentation in breast pathological images. The proposed dataset is available at:https://doi.org/10.57760/sciencedb.08547.  
      关键词:breast cancer;pathological image;mitosis nuclei;object detection;semantic segmentation;dataset   
      3
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216223 false
      发布时间:2024-03-13

      Image Analysis and Recognition

    • Pan Jianshan,Lin Li,Wu Jiewei,Liu Yixiang,Chen Xiaohua,Lin Qiyou,Huang Jianye,Tang Xiaoying
      Vol. 29, Issue 3, Pages: 620-636(2024) DOI: 10.11834/jig.230295
      pFedWSD: unified weakly supervised personalized federated image segmentation via similarity-aware distillation
      摘要:ObjectiveFederated learning (FL) allows multiple healthcare institutions to collaboratively train a powerful deep learning model without compromising data privacy and security (i.e., centralizing data). However, employing a single model to accommodate the diverse data distributions from different sites is extremely challenging. Performance degradation is common for existing approaches when huge distribution gaps exist across sites. Additionally, previous works paid little attention to FL under weak supervision, especially under the supervision of different sparsely grained forms (i.e., point-, bounding box-, scribble-, block-wise). Weakly supervised FL is clinically practical but challenging. To address this issue, we propose a unified and weakly-supervised personalized FL framework named pFedWSD, targeting medical image segmentation and based on similarity-aware knowledge distillation across multiple sites. We aim to accommodate the domain gaps and annotation drifts across multiple sites and enhance the segmentation model’s performance for each site.MethodThe proposed pFedWSD trains a personalized model for each site via cyclic knowledge distillation, which consists of two stages: uncertainty-aware dynamic and cyclic common knowledge accumulation and similarity-aware personalization. In the first stage, during each training round, the performance of each site’s model is dynamically ranked in an uncertainty-aware manner, and common knowledge is accumulated in the form of cyclic knowledge distillation. In the second stage, the similarity between two sites is measured and aggregated based on the statistics from the batch normalization layers to attain a teacher model for each site and perform knowledge distillation. As for weakly-supervised learning, a combination of partial cross-entropy loss, gated conditional random field (CRF) loss, and tree energy loss is employed. Specifically, the partial cross-entropy loss is employed for supervising the annotated regions, ensuring informative guidance. The tree energy loss establishes pairwise affinities on the basis of the preserved characteristics of high and low semantic spatial structures for the same object. This approach, in conjunction with the model’s predictions, generates soft pseudo-labels for the unlabeled regions. Through continuous online training and refinement, the model’s predictions and the delivered pseudo-annotations gradually improve over time. Furthermore, the gated CRF loss serves as a regularization term, effectively curbing the potential issues of excessive expansion or contraction of the target regions’ pseudo-labels that may arise from solely employing the tree energy loss. This approach adeptly consolidates diverse sparsely annotated data for training, facilitating real-time generations of additional pseudo proposals, and consequently attaining exceptional segmentation performance without requiring supplementary supervised data, iterative optimization, nor time-intensive post-processing. To the best of our knowledge, pFedWSD is a pioneering weakly supervised personalized federated learning approach for medical image segmentation and adeptly implemented under heterogeneous annotation settings on multiple client devices.ResultWe create two datasets (from multiple publicly available datasets), each with five subsets serving as five different sites, for optic/disc cup (OD/OC) segmentation and retinal foveal avascular zone (FAZ) segmentation, respectively. Quantitative and qualitative experimental results show that pFedWSD outperforms representative state-of-the-art (SOTA) centralized and personalized FL methods in terms of Dice coefficients and HD95 statistics. The proposed pFedWSD achieves an average Dice coefficient of 90.38% on the OD/OC segmentation task, exhibiting a remarkable improvement of 1.67% over the previous best-performing method. Moreover, pFedWSD demonstrates a marginal difference of only 0.58% compared with local training under full supervision and a slight gap of merely 1.23% from centralized training under full supervision. Regarding the FAZ segmentation task, the proposed method achieves an impressive average Dice coefficient of 93.12%, showcasing a substantial improvement of 6.56% over the previous state-of-the-art method. Furthermore, pFedWSD has a marginal difference of 0.5% from local training under full supervision and a mere 0.86% difference from centralized training under full supervision.ConclusionThe proposed weakly-supervised and personalized FL framework (pFedWSD) can effectively unify different forms of sparsely labeled data and train personalized models that adapt well to different data distributions, with an established superior segmentation performance. Our pFedWSD demonstrates its effectiveness through achieving optimal performance on both OD/OC and FAZ segmentation tasks across datasets from multiple centers, with its overall performance closely approaching that of local or centralized training using fully supervised labels. Extensive ablation experiments demonstrate the importance and efficacy of each stage in pFedWSD and each component in the weakly supervised composite objective. Moreover, through site-ablation experiments, we analyze the contribution of each site to the federation, providing valuable guidance for medical institutions regarding the appropriate data volume and the sparse annotation form in federated learning. Future research directions include the further reduction of the communication and computation overhead and the integration of universal large model training paradigms, like prompt learning, to concurrently foster our proposed framework’s generalization performance and adaptive personalization capacity toward diverse data distributions.  
      关键词:similarity-aware;knowledge distillation;weakly supervised learning;personalized federated learning;medical image segmentation   
      3
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216318 false
      发布时间:2024-03-13
    • Mei Huawei,Shang Honglin,Su Pan,Liu Yanping
      Vol. 29, Issue 3, Pages: 637-654(2024) DOI: 10.11834/jig.230140
      Optic disc and cup segmentation with combined residual context encoding and path augmentation
      摘要:ObjectiveOphthalmic image segmentation is an important part of medical image analysis. Among these, optic disc (OD) and optic cup (OC) segmentation are crucial technologies for the intelligent diagnosis of glaucoma, which can cause irreversible damage to the eyes and is the second leading cause of blindness worldwide. The primary glaucoma screening method is the evaluation of OD and OC based on fundus images. The cup disc ratio (CDR) is one of the most representative glaucoma detection features. In general, eyes with CDR greater than 0.65 are considered to have glaucoma. With the continuous development of deep learning, U-Net and its variant models, including superpixel classification and edge segmentation, have been widely used in OD and OC segmentation tasks. However, the segmentation accuracy of OD and OC is limited, and their efficiency is low during training due to the loss of spatial information caused by continuous convolution and pooling operations. To improve the accuracy and training efficiency of OD and OC segmentation, we proposed the residual context path augmentation U-Net (RCPA-Net), which can capture deeper semantic feature information and solve the problem of unclear edge localization.MethodRCPA-Net includes three modules: feature coding module (FCM), residual atrous convolution (RAC) module, and path augmentation module (PAM). First, the FCM block adopts the ResNet34 network as the backbone network. By introducing the residual module and attention mechanism, the model is enabled to focus on the region of interest, and the efficient channel attention (ECA) is adopted to the squeeze and excitation (SE) module. The ECA module is an efficient channel attention module that avoids dimensionality reduction and captures cross-channel features effectively. Second, the RAC block is used to obtain the context feature information of a wider layer. Inspired by Inception-V4 and context encoder network(CE-Net), we fuse cavity convolution into the inception series network and stack convolution blocks. Traditional convolution is replaced with cavity convolution, such that the receptive field increases while the number of parameters remains the same. Finally, to shorten the information path between the low-level and top-level features, the PAM block uses an accurate low-level positioning signal and lateral connection to enhance the entire feature hierarchy. To solve the problem of extremely unbalanced pixels and generate the final segmentation map, we propose a new multi-label loss function based on the dice coefficient and focal loss. This function improves the pixel ratio between the OD/OC and background regions. In addition, we enhance the training data by flipping the image and adjusting the ratio of length and width. Then, the input images are processed using the contrast-limited adaptive histogram equalization method, and each resultant image is fused with its original one and then averaged to form a new three-channel image. This step aims to enhance image contrast and enrich image information. In the experimental stage, we use Adam optimization instead of the stochastic gradient descent method to optimize the model. The number of samples selected for each training stage is eight, and the weight decay is 0.000 1. During training, the learning rate is adjusted adaptively in accordance with the number of samples selected each time. In outputting the prediction results, the maximum connected region in OD and OC is selected to obtain the final segmentation result.ResultFour datasets (ORIGA, Drishti-GS1, Refuge, and RIM-ONE-R1) are employed to validate the performance of the proposed method. Then, the results are compared with various state-of-the-art methods, including U-Net, M-Net, and CE-Net. The ORIGA dataset contains 650 color fundus images of 3 072 × 2 048 pixels, and the ratio of the training set to the test set is 1∶1 during the experiment. The Drishti-GS1 dataset contains 101 images, including 31 normal images and 70 diseased images. The fundus images are divided into two datasets, Groups A and B, which include 50 training samples and 51 testing samples, respectively. The 400 fundus images in the Refuge dataset are also divided into two datasets. Group A includes 320 training samples, while Group B includes 80 testing samples. The Jaccard index and F-measure score are used in the experimentation to evaluate the results of OD and OC segmentation. The results indicate that in the ORIGA dataset, the Jaccard index and F-measure of the proposed method in OD/OC segmentation are 0.939 1/0.794 8 and 0.968 6/0.885 5, respectively. In the Drishti-GS1 dataset, the results in OD/OC segmentation are 0.951 3/0.863 3 and 0.975 0/0.926 6, respectively. In the Refuge dataset, the results are 0.929 8/0.828 8 and 0.963 6/0.906 3, respectively. In the RIM-ONE-R1 dataset, the results of OD segmentation are 0.929 0 and 0.962 8. The results of the proposed method on the four datasets are all better than those of its counterparts, and the performance of the network is significantly improved. In addition, we conduct ablation experiments for the primary modules proposed in the network, where we perform comparative experiments with respect to the location of the modules, the parameters in the model, and other factors. The results of the ablation experiments demonstrate the effectiveness of each proposed module in RCPA-Net.ConclusionIn this study, we propose RCPA-Net, which combines the advantages of deep segmentation models. The images predicted using RCPA-Net are closer to the real results, providing more accurate segmentation of OD and OC than several state-of-the-art methods. The experimentation demonstrates the high effectiveness and generalization ability of RCPA-Net.  
      关键词:optic disc and optic cup segmentation;deep learning;attention mechanism;residual atrous convolution;path augmentation   
      4
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216588 false
      发布时间:2024-03-13
    • Chai Jingwen,Li Ankang,Zhang Hao,Ma Yong,Mei Xiaoguang,Ma Jiayi
      Vol. 29, Issue 3, Pages: 655-669(2024) DOI: 10.11834/jig.230356
      3D multi-organ segmentation network combining local and global features and multi-scale interaction
      摘要:ObjectiveHighly conformal radiotherapy is a widely adopted cancer treatment modality requiring meticulous characterization of cancer tissues and comprehensive delineation of the surrounding anatomical structures. The efficacy and safety of this technique depend generally on the ability to precisely target the tumor, necessitating a thorough understanding of the corresponding organ-at-risk anatomy. Thus, accurate and detailed depiction of the neoplastic and adjacent normal tissues using advanced imaging techniques is critical in optimizing the outcomes of highly conformal radiotherapy. Given the current inadequacy of conventional segmentation methods in achieving accurate and efficient delineation of multi-organ structures from 3D medical images, there exists a promising opportunity for research on developing precise and automated segmentation techniques using deep learning approaches. By leveraging the capacity of deep neural networks (DNNs) to learn complex hierarchical representations from vast amounts of labeled data, this technique can facilitate the identification and extraction of specific features and patterns from medical images, leading to considerably reliable and efficient segmentation outcomes. This method could significantly enhance the clinical utility of imaging data in various diagnostic and therapeutic applications, including but not limited to radiation therapy planning, surgical navigation, and disease assessment. Over the past few years, there has been increasing interest in exploring the benefits of integrating vision Transformer (ViT) with convolutional neural networks (CNNs) to enhance the quality and accuracy of semantic segmentation tasks. One promising research direction that has emerged involves addressing the issue of multi-scale representation, which is critical for achieving robust and precise segmentation results on various medical imaging datasets. However, current state-of-the-art methods have failed to fully maximize the potential of multi-scale interaction between CNNs and ViTs. For example, some methods completely disregard multi-scale structures or achieve it by limiting the computational scope of ViTs. Other methods rely solely on CNN or ViT at the same scale, disregarding their complementary advantages. In addition, the existing multi-scale interaction methods often neglect the spatial association between two-dimensional slices, resulting in poor performance in processing volume data. Therefore, further research is needed to solve the aforementioned problems.MethodThis research aims to address the limitations of existing methods for multi-organ segmentation in 3D medical images by proposing a new approach. By recognizing the importance of simultaneously determining local and global features at the same scale, a universal feature encoder known as the LoGoF module is introduced for use in multi-organ segmentation networks. This method enables the creation of an end-to-end 3D medical image multi-organ segmentation network (denoted as M0), which leverages the LoGoF module. To further enhance the model’s ability to determine complex relationships between organs at different scales, a multi-scale interaction module and an attention-guided structure are incorporated into M0. These novel techniques introduce spatial priors into the features extracted at different scales, enabling M0 to accurately perceive inter-organ relationships and identify organ boundaries. By leveraging the preceding advanced components, the proposed model, called LoGoFUNet, enables robust and efficient multi-organ segmentation in 3D medical images. Overall, this approach represents a significant step forward in advancing the accuracy and efficiency of multi-organ segmentation in clinical applications.ResultIn experiments conducted on two well-known medical imaging datasets (i.e., Synapse and SegTHOR), LoGoFUNet demonstrated impressive gains in accuracy over the second-best performing model. Compared with the runner-up, LoGoFUNet achieved a 2.94% improvement in the Dice similarity coefficient on the Synapse dataset, and a 4.93% improvement on the SegTHOR dataset. Furthermore, the 95th percentile Hausdorff distance index showed a significant decrease of 8.55 and 2.45 on Synapse and SegTHOR, respectively, indicating an overall improvement in multi-organ segmentation performance. On the ACDC dataset, the applicability of the 3D segmentation method is mostly poor, but LoGoFUNet still obtains better results than the 2D advanced method. This result indicates LoGoFUNet’s superior adaptability and versatility to different types of datasets. These findings suggest that LoGoFUNet is a highly competitive and robust framework for accurate multi-organ segmentation in various clinical settings. This study conducts further ablation experiments to provide additional evidence supporting the effectiveness of and justification for LoGoFUNet. These experiments serve to verify the role and contribution of each of the proposed components, including the LoGoF encoder, multi-scale interaction module, and attention-guidance structure, in achieving the superior segmentation performance observed with LoGoFUNet. By systematically removing and evaluating the impact of each component on segmentation accuracy, these experiments confirm that the proposed module design is rational and effective. Thus, results of the ablation experiments further reinforce the value and potential clinical significance of adopting the LoGoFUNet framework for multi-organ segmentation in 3D medical imaging applications.ConclusionThe experimental evaluation of the proposed segmentation model suggests that it effectively integrates information exchange within and between different scales. This outcome leads to improved segmentation performance and superior generalization capabilities on the dataset. By facilitating the interaction of multi-scale representations and leveraging novel techniques, such as intra- and inter-scale information exchange mechanisms, this approach enables the model to accurately determine complex spatial relationships and produce high-quality segmentations across a range of 3D medical imaging datasets. Findings highlight the importance of multi-scale features and information exchange in achieving robust and accurate medical image segmentation results. Lastly, results suggest that the proposed framework could provide significant benefits in a variety of clinical applications.  
      关键词:multi-organ segmentation;deep neural network(DNN);vision Transformer(ViT);local-global feature;multi-scale interaction(MSI)   
      3
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216103 false
      发布时间:2024-03-13
    • Zhou Qi,Yang Hang,Tian Chuangeng,Tang Lu,Hui Yu
      Vol. 29, Issue 3, Pages: 670-685(2024) DOI: 10.11834/jig.230324
      Multi-scale fusion-enhanced ultrasound elastic images segmentation for mediastinal lymph node
      摘要:ObjectiveUltrasound elastography enables non-invasive diagnosis of lesion tissues by analyzing the differences in hardness among different body tissues. It is gradually being used in the diagnoses of many diseases. In bronchial ultrasound elastography, accurately segmenting mediastinal lymph nodes from images is significant for diagnosing whether lung cancer has metastasized and has an important role in the consequent staging and diagnosis of cancer. Manual segmentation methods performed by radiologists are always time-consuming, and research on automated segmentation, specifically for ultrasound elastic images, is limited. Therefore, deep learning-based assisted segmentation methods have attracted considerable attention. Although ultrasound elastic images can provide some guidance for the segmentation of regions of interest, the obscuring of texture information in this area also makes segmentation challenging to execute. Existing research has focused primarily on the encoder structure of the model, particularly by incorporating different pre-trained models to accommodate the three-channel data format of ultrasound elastic images. However, limited research has been conducted on the intermediate features obtained by the encoder and decoder structures, resulting in less precise segmentation results. Therefore, this study proposes a network for the segmentation of the mediastinal lymph node, called attention-based multi-scale fusion enhanced ultrasound elastic images segmentation network for mediastinal lymph node (AMFE-UNet).MethodFirst, a pre-trained dense convolutional network (DenseNet) with dense connections is introduced into the U-Net architecture to extract channel and position information from ultrasound elastic images. Second, to model the boundaries and textures of the nodules from different scales and scopes, this research enhanced the decoder module with efficient channel attention (ECA) and dilated convolutions. Three dilated convolution branches and one pooling branch are set up in each decoder module. Different combinations of the results from these branches are used to obtain the following four decoder structures. 1) Decoder-A: Results from each branch are added and passed through the ECA module. 2) Decoder-B: Results from each branch are concatenated along the channel dimension and passed through an ECA module. 3) Decoder-C: Each branch is equipped with an ECA module, and results from each branch are concatenated along the channel dimension. 4) Decoder-D: Results from each branch are densely connected and passed through an ECA module. Lastly, selective kernel network (SK-Net) is used to enhance the fusion of features obtained from the encoder and decoder, ensuring a considerably comprehensive integration. In the experiments, the proposed models are implemented using Python 3.7 and PyTorch 1.12. The image processing workstation is equipped with an Intel i9-13900K CPU and two NVIDIA RTX 4090 GPUs, each with 24 GB memory. The initial parameters of the model are obtained using the default initialization method in PyTorch. The Adam optimizer is used to update the network parameters. Learning rate is initially set to 0.000 1, with a weight decay coefficient of 0.1, and it is decayed every 90 iterations. Dice coefficient is used as loss function, and the model is trained for 190 epochs.ResultThe experiment is performed on a collected dataset of bronchial ultrasound elastic images with six-fold cross-validation. The evaluation metrics include the Dice coefficient, sensitivity, specificity, precision, intersection over union (IoU), Hausdorff distance 95 percentile (HD95), parameters, and GFlops. The range of the first five metrics is between 0 and 1; a higher value indicates better segmentation performance. HD95 does not have a specified range, and a lower value indicates better segmentation performance. The ablation experiments show improvements in the skip connection structure and decoder structure proposed for the model. The model using SK-Net as skip connections is only slightly less sensitive than Dense-UNet, while the remaining five metrics are better than Dense-UNet. The four models using the multi-scale fusion-enhanced decoder outperform Dense-UNet by 0.4% to 0.9% in Dice coefficient and up to 2% in precision. Two final models were designed according to the ablation experiment: AMFE-UNet A and AMFE-UNet B. AMFE-UNet compared with a variety of models, including U-Net, Att-UNet, Seg-Net, DeepLabV3+, Trans-UNet, U-Net++, BPAT-UNet, CTO, and ACE-Net. The Dice coefficient of AMFE-UNet is 86.59% on average, which is an improvement of 1.983% compared with U-Net. AMFE-UNet A is optimal in terms of Dice coefficient, precision, and specificity. Meanwhile, AMFE-UNet B is optimal in terms of sensitivity, IoU, and HD95. The class activation map demonstrates that AMFE-UNet achieves better segmentation sensitivity and completeness by focusing on the content of the region at the lower levels of the network and on the boundaries of the region at the higher levels of the network. The other networks only focus on the content of the region and are ineffective at segmenting the region’s boundaries. The loss variation curves for training and testing of the model indicate that AMFE-UNet B has faster convergence and better segmentation than AMFE-UNet A.ConclusionAdequate experiments demonstrate the excellent segmentation effectiveness of the AMFE-UNet combined attention mechanism for ultrasound elastic images, which has significance for future research on multichannel medical images. The code is available at https://github.com/Philo-github/AMFE-UNet.  
      关键词:ultrasound elastography(UE);mediastinal lymph nodes;instance segmentation;U-Net;channel attention mechanism   
      3
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216225 false
      发布时间:2024-03-13
    • Qin Jun,Lu Tinglan,Ji Bai,Li Yuqing
      Vol. 29, Issue 3, Pages: 686-696(2024) DOI: 10.11834/jig.230540
      Tooth segmentation network for low-dose CT
      摘要:ObjectiveThe field of dentistry can be developed and improved with the mutual penetration and integration of computer technology and modern dentistry. Cone beam computed tomography (CBCT) has become one of the most commonly used medical imaging techniques in dental diagnoses and treatments. CBCT has the advantages of low radiation dosage, simple operation, and low cost. However, the noise and artifacts of CBCT are more intense than those of conventional CT, and fuzzy tooth boundaries will affect doctors’ diagnoses and subsequent treatments. In oral diagnoses and treatments, doctors usually need to manually segment the tooth model in CBCT to formulate subsequent treatment plans. However, this method is time-consuming and labor-intensive, and the segmentation results of the teeth are also significantly affected by doctors’ subjective factors. In addition, the existing network method often fails to achieve the expected results. The segmentation network based on deep learning also relatively exists in the segmentation accuracy and other performance after network segmentation to improve gradient explosion, overfitting, and over-expression. Limitations include gradient explosion, overfitting, and the inability to focus on global image information. Therefore, people have been working to find a dental segmentation method with high automation and high accuracy. To address this problem, a dental segmentation model called multi scale feature extraction module and coordinate attention (CA) mechanism network (MF-CA Net) is proposed, which uses a series of innovative methods to improve the accuracy and robustness of dental segmentation.MethodThe MF-CA Net network uses the multi-scale feature extraction module (MFEM) to extract features at different scales of images and utilizes the CA attention mechanism that is currently excelling in improving network performance. MFEM uses four different convolution kernels for convolution, enabling the extraction of multi-scale features and facilitating the network to learn markedly robust representations. Meanwhile, dilation convolution uses four dilation rates to further increase the receptive field, enabling the network to obtain significantly detailed information and refine important features. The CA attention mechanism calculates the spatial and channel attention weights in the input feature maps. It adaptively weights the feature map, emphasizing more representative local structures and global contextual information. By embedding positional information into the channel attention, the CA mechanism assists the network in accurately localizing and identifying the objects of interest. These modules enable the network to accurately determine the tooth region of interest and extract extensive and dense multi-scale feature information to effectively guide the segmentation task. For tooth root segmentation, these modules can significantly improve the accuracy of segmentation. To further improve the performance of the segmentation algorithm, the MF-CA Net network model also uses structural similarity to construct the boundary loss function. Moreover, the algorithm uses a combination of the Dice, binary cross-entropy, and structural similarity (SSIM) loss functions as final loss function. The Dice loss function is used to compute the similarity between two sets of images, whereas the cross-entropy loss function is used to predict the segmentation result and pixels corresponding to the real segmentation result. This loss function integrates tooth edge segmentation in three directions, namely, pixel, local, and global levels, to improve the accuracy and robustness of the algorithm.ResultTo more accurately evaluate the performance of the proposed model in tooth segmentation, Dice similarity coefficient, mean intersection to merger ratio (mIoU), accuracy, recall, precision, and F2 score are used as evaluation metrics. This study compares the MF-CA Net model and six mainstream methods on the dataset. Experimental results show that the MF-CA Net model has significant improvement in most of the evaluation metrics compared with other segmentation methods. Although MF-CA Net is slightly lower than DeeplabV3+ in accuracy metrics, it achieves a high score of 0.949 5 in the Dice evaluation metrics, which is an improvement of 4% compared with PyConvU-Net, about 4% compared with DeeplabV3+, and about 16% compared with U-Net. In addition, the mIoU metric improves from 3% to nearly 11%. Precision value reaches 0.942 1, which is a 7% improvement compared with UNET++. The recall metric reaches 0.968 7, which is an 8% improvement compared with the UNET network. Lastly, the F2 metric reaches 0.954 3, which is a 5% improvement compared with the Res-UNet value. Results fully demonstrate the superiority of the MF-CA network model in tooth segmentation.ConclusionThe proposed MF-CA network model successfully solves the difficult problem of tooth segmentation in CBCT images by introducing a multiscale feature extraction module, an attention mechanism, and a hybrid loss function. Many experimental results verify the proposed model’s superiority in accurate tooth segmentation. Lastly, the proposed model is expected to be widely used in dental diagnoses and treatments, which is significant in oral diagnoses and treatments.  
      关键词:deep learning;cone beam computed tomography(CBCT);tooth segmentation;attention mechanism;multi-scale information;loss function;segmentation accuracy   
      3
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216101 false
      发布时间:2024-03-13
    • Shen Yiting,Chen Zhao,Zhang Qinghua,Chen Jinhao,Wang Qingguo
      Vol. 29, Issue 3, Pages: 697-712(2024) DOI: 10.11834/jig.230477
      Fully and weakly supervised graph networks for histopathology image segmentation
      摘要:ObjectiveComputer-assisted techniques and histopathology image processing technologies have significantly facilitated pathological diagnoses. Among them, histopathology image segmentation is an integral component of histopathology image processing, which generally refers to the separation of target regions (e.g., tumor cells, glands, and cancer nests) from the background, is further used for downstream tasks (e.g., cancer grading and survival prediction). In recent years, the rapid development of deep learning has resulted in significant breakthroughs in histopathology image segmentation. Segmentation networks, such as FCN and U-Net, have demonstrated strong capabilities in accurately delineating edges. However, most existing deep learning methods rely on fully supervised learning mode, which depends on numerous accurately annotated digital histopathology images. Manual annotation, conducted by medical professionals with expertise in histopathology, is time-consuming and also introduces a high likelihood of missed diagnoses and false detections. Consequently, there is a scarcity of histopathology images with precise annotations. Moreover, histopathology images are highly complex, making it extremely challenging to distinguish targets from the background, thereby leading to inter-class homogeneity. Within the same dataset of tissue samples, there are significant variations among pathological objects, exhibiting intra-class heterogeneity. Differences between patients and nonlinear relationships between image features impose high requirements on the robustness and generalization of histopathological tissue segmentation algorithms. Therefore, this study proposes a graph-based framework for histopathology image segmentation.MethodThe framework consists of two modes, namely, fully supervised graph network (FSGNet) and weakly supervised graph network (WSGNet), aiming to adapt to datasets with different levels of annotation and precision requirements in various application scenarios. FSGNet is used when working with samples having pixel-level labels and requiring high accuracy. It is trained in a fully supervised manner. Meanwhile, WSGNet is utilized when dealing with samples that only have sparse point labels. It utilizes weakly supervised learning to extract histopathology image information and train the segmentation network. Furthermore, the proposed framework uses graph convolutional networks (GCN) to represent the irregular morphology of histopathological tissues. GCN is capable of handling data with arbitrary structures and learns the nonlinear structure of images by constructing a topological graph based on histopathology images. This approach contributes to improving the accuracy of histopathology image segmentation. The current study introduces graph Laplacian regularization to facilitate the learning of similar features from neighboring nodes, effectively aggregating similar nodes and enhancing the proposed model’s generalization capability. FSGNet consists of a backbone network and GCN. The backbone network follows an encoder-decoder structure to extract deep features from histopathology images. GCN is used to learn the nonlinear structure of histopathological tissues, enhancing the network’s expressive power and generalization ability, ultimately resulting in the segmentation of target regions from the background. WSGNet utilizes simple linear iterative clustering (SLIC) for superpixel segmentation of the original image. This method transforms the weakly supervised semantic segmentation problem into a binary classification problem for superpixels. WSGNet leverage local spatial similarity to reduce the computational complexity of subsequent processing. In the preprocessing stage, the semantic information of point labels can be propagated to the entire superpixel region, thereby generating superpixel labels. WSGNet is capable of accomplishing the segmentation of histopathology images even with a limited number of point annotations.ResultThis study paper conducted tests on two public datasets, namely, Gland Segmentation Challenge Dataset (GlaS) and Colorectal Adenocarcinoma Gland (CRAG) dataset, as well as one private dataset called Lung Squamous Cell Carcinoma (LUSC). GlaS consists of 165 images, with a training-to-testing ratio of 85:80. It is stratified based on histological grades and fields of view, and the testing set is further divided into Parts A and B (60 and 20 images, respectively). CRAG comprises 213 images of colorectal adenocarcinoma, with a training-to-testing ratio of 173:40. LUSC contains 110 histopathological images, with a training-to-testing ratio of 70:40. The performance of FSGNet was compared with FCN-8, U-Net, and UNeXt. WSGNet was compared with recently proposed weakly supervised models, such as WESUP, CDWS, and SizeLoss. The two modes of the proposed framework outperformed the comparison algorithms in terms of overall accuracy (OA) and Dice index (DI) on the three datasets. FSGNet achieved an OA of 88.15% and DI of 89.64% on GlaS Part A, OA of 91.58% and DI of 91.23% on GlaS Part B, OA of 93.74% and DI of 92.58% on CRAG, and OA of 92.84% and DI of 93.20% on LUSC. WSGNet achieved an OA of 84.27% and DI of 86.15% on GlaS Part A, OA of 84.91% and DI of 83.60% on GlaS Part B, OA of 85.50% and DI of 80.17% on CRAG, and OA of 88.45% and DI of 87.89% on LUSC. Results indicate that the proposed framework exhibits robustness and generalization capabilities across different datasets because its performance does not vary significantly.ConclusionThe two modes of the proposed framework demonstrate excellent performance in histopathological image segmentation. Subjective segmentation results indicate that the framework is able to achieve more complete segmentation of examples and provide more accurate predictions of the central regions of the target samples. It exhibits fewer instances of missed and false detections, thereby showcasing strong generalization and robustness.  
      关键词:histopathology image segmentation;graph convolutional network(GCN);fully supervised learning;weakly supervised learning;point labels   
      2
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216097 false
      发布时间:2024-03-13
    • Cao Weijie,Duan Xianhua,Xu Zhenwei,Sheng Shuai
      Vol. 29, Issue 3, Pages: 713-724(2024) DOI: 10.11834/jig.230233
      Application of U-Net channel transformation network in gland image segmentation
      摘要:ObjectiveAdenocarcinoma is a malignant tumor originating from the glandular epithelium and poses immense harm to human health. With the rapid development of computer vision technology, medical imaging has become an important means for expert preoperative diagnosis. In the diagnosis of adenocarcinoma, doctors judge the severity of the cancer and grade it by analyzing the size, shape, and other external features of the glandular structure. Accordingly, achieving high-precision segmentation of glandular images has become an urgent requirement in clinical medicine. Glandular medical image segmentation refers to the process of separating the glandular region from the surrounding tissue in medical images, requiring high segmentation accuracy. Traditional models for segmenting glandular medical images can suffer from such problems as imprecise segmentation and mis-segmentation owing to the diverse shapes of glands and presence of numerous small targets. To address this issue, this study proposes an improved glandular medical image segmentation algorithm based on UCTransNet. UCTransNet addresses solves the semantic gap between different resolution modules of the encoder and between the encoder and decoder, thereby achieving high precision image segmentation.MethodFirst, a combination of the fusion of ASPP_SE and ConvBatchNorm modules is added to the front end of the encoder. The ASPP_SE module combines the ASPP module and channel attention mechanism. The ASPP module consists of three different dilation rates of atrous convolution, a 1 × 1 convolution, and an ASPP pooling. Atrous convolution injects holes into standard convolution to expand the receptive field, obtain dense data features, and maintain the same output feature map size. The ASPP module uses multi-scale atrous convolution to obtain a large receptive field, and fuses the obtained features with the global features obtained from the ASPP pooling to obtain denser semantic information than the original features. The channel attention mechanism enables the model to focus considerably on important channel regions in the image, dynamically select information in the image, and give substantial weight to channels containing important information. In the CCT (channel cross fusion with Transformer), modules with higher weight of important information will achieve better fusion. The ConvBatchNorm module enhances the ability of the encoder to extract the features of small targets, while preventing overfitting during model training. Second, a simplified dense connection is embedded between the encoder and the skip connections, and the CCT in the model performs global feature fusion of the features extracted by the encoder from a channel perspective. Although the global attention ability of the CCT is strong, its problem is a weak local attention ability, and the ambiguity between adjacent encoder modules has not been solved. To solve this problem, a dense connection is added to enhance the local information fusion ability. The dense connection passes the upper encoder module through convolution pooling to obtain the lower encoder module and performs upsampling on the lower encoder to make its resolution consistent with the upper encoder module. The two encoder modules are concatenated on the channel, and the resolution does not change after concatenation. After concatenation, the upper encoder module obtains the feature information supplement of the lower encoder module. Consequently, the semantic fusion between adjacent modules is enhanced, the semantic gap between adjacent encoder modules is reduced, and the feature information fusion between adjacent encoder modules is improved. A refiner is added to the CCT, which projects the self-attention map to a higher dimension, and uses the head convolution to enhance the spatial context and local patterns of the attention map. This method effectively combines the advantages of self-attention and convolution to further improve the self-attention mechanism. Lastly, a linear projection is used to restore the attention map to the initial resolution, thereby enhancing the global feature information fusion of the encoder. A fusion ASPP_SE and ConvBatchNorm modules are added to the front end of the UCTransNet encoder to enhance its ability to extract small target features and prevent overfitting. Second, a simplified dense connection is embedded between the encoder and skip connection to enhance the fusion of adjacent module features. Lastly, a refinement module is added to the CCT to project the self-attention map to a markedly high dimension, thereby enhancing the global feature fusion ability of the encoder. The combination of the simplified dense connection and CCT refinement module improves the performance of the model.ResultThe improved algorithm was tested on the publicly available gland data sets MoNuSeg and Glas. The Dice and intersection over union(IoU) coefficients were the main evaluation metrics used. The Dice coefficient is a similarity measure used to represent the similarity between two samples. By contrast, the IoU coefficient is a standard used to measure the accuracy of the result’s positional information. Both metrics are commonly used in medical image segmentation. The test results on the MoNuSeg data set were 80.55% and 67.32%, while those on the Glas data set were 92.23% and 86.39%. These results represent improvements of 0.88% and 1.06%, and 1.53% and 2.43%, respectively, compared those of the original UCTransNet. The improved model was compared to existing popular segmentation networks and was found to generally outperform them.ConclusionThe proposed improved model is superior to existing segmentation algorithms in medical gland segmentation and can meet the requirements of clinical medical gland image segmentation. The CCT module in the original model was further optimized to fuse global and local feature information, thereby achieving better results.  
      关键词:medical image segmentation;U-Net from a channel-wise perspective with Transformer (UCTransNet);dense connection;self-attention mechanism;refinement module   
      2
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216507 false
      发布时间:2024-03-13
    • Shi Caijuan,Zheng Yuanfan,Ren Bijuan,Kong Fanyue,Duan Changyu
      Vol. 29, Issue 3, Pages: 725-740(2024) DOI: 10.11834/jig.230279
      Single-domain generalized breast tumor detection in X-ray images
      摘要:ObjectiveBreast tumor detection in X-ray images is a great challenge in the domain of medical image analysis, primarily because of the intrinsic difficulty in discerning lesions due to their significant concealment and propensity for metastasis. Currently, computer-aided diagnosis (CAD) plays a pivotal role in early tumor detection and diagnosis. Remarkable progress has been achieved in detecting breast tumors in X-ray images through deep learning-based object detection methods when the training and testing data are of the same modality. However, the limited availability of medical image data and the labor-intensive and professional nature of data annotation have constrained the detection performance and generalization ability of models. In addition, the presence of domain shift in the unseen domains caused by noise impairs the performance of breast tumor detection across diverse environments. To address these issues, existing studies have proposed different methods, including domain adaptation and domain generalization. However, domain adaptation requires a partition between the target and source domains, while domain generalization requires training the models in multiple domains. Achieving domain division poses a formidable challenge due to the limited availability of medical data. Therefore, in response to these challenges, single-domain methods have been proposed to train the models in a single domain and then they are generalized to the unseen domains in recent years. These methods are well-suited for medical data for aiding in mitigating domain shifts. Though single-domain generalization has been widely applied in classification tasks, its application to object detection tasks remains relatively nascent due to the inherent differences between object detection and classification. Through analysis, we found the single instance only focuses on holistic images for domain alignment in the classification tasks. In contrast, object detection tasks entail the simultaneous consideration of multiple objects within each image, which leads to the mismatch of instances. Thus, we propose a novel instance alignment paradigm to facilitate the single-domain generalization for detecting breast tumors.MethodTo improve the generalization performance for robust breast tumor detection in X-ray images, we propose a novel model called the single-domain generalization model (SDGM). The SDGM is constructed upon the baseline (RetinaNet) and employs Resnet-50 as its backbone. Two pivotal modules, namely, the instance generalization module (IGM) and the domain feature enhancement module (DFEM), are developed. First, the IGM is strategically positioned at the detection head to enhance the generalization performance by normalizing and whitening the category semantic information of each instance. The IGM comprises N sets of 3 × 3 convolutions and the switchable whitening sub-module, which is widely recognized for its effectiveness in extracting instance domain-invariant features in classification tasks. Therefore, IGM is integrated into the classification branch at the detection head. Second, the DFEM is ingeniously devised to efficiently merge the global information from both up-sampling and down-sampling processes while mitigating the impact of noise in medical images. To counteract the noise generated by conventional convolution in spatial features, a 3 × 3 convolution is employed to generate a foreground mask image, which serves as the convolution offset to guide the deformable convolution for sampling. Subsequently, channel-wise attention is leveraged to selectively suppress noise within each channel. The DFEM is incorporated into the feature pyramid network to attenuate the noise during the fusion of feature maps at various scales, thereby promoting subsequent domain-invariant feature extraction.ResultTo assess the efficiency of our proposed SDGM, we conduct extensive experiments on the CBIS-DDSM dataset and the INbreast dataset, which is single-domain generalized with multiple domains in the intra-domain. Additionally, we compare the SDGM against several state-of-the-art methods. We also evaluate the inter-domain generalization performance between the CBIS-DDSM and INbreast datasets. In the intra-domain single-domain generalization scenarios, the SDGM consistently outperforms the baseline method (RetinaNet) by a 9.7% increase in mean average precision. Furthermore, it surpasses other one-stage anchor-free methods (e.g., FCOS and FoveaBox), one-stage anchor-based methods (e.g., ATSS and TOOD), two-stage methods (e.g., Faster R-CNN and Cascade-RCNN), and even the transformer-based method PVTv2. In the supervised learning scenarios, the SDGM trained with only 728 images, surpasses RetinaNet, Cascade-RCNN, FoveaBox, and FCOS trained with 5 148 images. This result demonstrates that the SDGM exhibits remarkable generalization capabilities, outperforming supervised methods with substantially less training data. Furthermore, we assess the impact of the attention mechanism on the model performance. Compared with the method TOOD without attention, the SDGM alleviates domain shift to achieve at least a 3.6% improvement in the single-domain generalization scenario. Additionally, compared with PVTv2 and ResNeSt, which employ different attention mechanisms, the SDGM alleviates domain shift to achieve 21.1% and 2.8% improvement respectively, in the single-domain generalization scenarios. In the inter-domain single-domain generalization scenarios, the SDGM displays a performance improvement of 5.8% compared with the baseline. These results indicate that our proposed SDGM not only mitigates performance degradation but also has robustness and generalization capabilities across different datasets.ConclusionIn this study, we develop the SDGM for detecting breast tumors in X-ray images and focus on designing two important components: the DFEM and the IGM. The DFEM improves the performance of SDGM by effectively suppressing the noise in the global information. Meanwhile, the IGM is positioned at the detection head to enhance the generalization ability by normalizing and whitening the category information for each object. We evaluate the SDGM on the INbreast and CBIS-DDSM datasets with multiple benchmarks to evaluate its efficiency. The SDGM can handle domain shift and perform well even with limited labeled medical data, mitigating challenges in medical image analysis. Additionally, the SDGM exhibits robustness across different environmental conditions. In summary, the SDGM offers a promising solution to improving breast tumor detection in X-ray images, making a valuable impact on clinical practice.  
      关键词:breast tumor detection in X-ray images;single-domain generalization;domain shift;normalization and whitening;feature enhancement   
      2
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216424 false
      发布时间:2024-03-13
    • Xu Wangwang,Xu Liangfeng,Li Bokai,Zhou Xi,Lyu Na,Zhan Shu
      Vol. 29, Issue 3, Pages: 741-754(2024) DOI: 10.11834/jig.230130
      TransAS-UNet: regional segmentation of breast cancer Swin Transformer and of UNet algorithm
      摘要:ObjectiveBreast cancer is a serious and high-morbidity disease in women. Early detection of breast cancer is an important problem that needs to be solved all over the world. The current diagnostic methods for breast cancer include clinical, imaging, and histopathological examinations. The commonly used methods in imaging examination are X-ray, computed tomography(CT), and magnetic resonance imaging. etc., among which mammograms have been used in early cancer to detect; however, manually segmenting the mass from the local mammogram is an very time-consuming and error-prone task. Therefore, an integrated computer aided diagnosis(CAD) is needed to help radiologists perform automatic and precise breast mass identification.MethodIn this work, we compared different image segmentation models based on the deep learning image segmentation framework. At the same time, on the based UNet structure, we adopt the Swin architecture to replace the downsampling and upsampling processes in the segmentation task ,to realize the interaction between local and global features. At the same time we use a Transformer to obtain more global information and different hierarchical features to replace short connections and realize multi-scale feature fusion to achieve accurate segmentation. In the segmentation model stage, we also use so as Multi-Attention ResNet classification network to identify the classification of cancer regions Better diagnosis and treatment of breast cancer. During segmentation the Swin Transformer and atrous spatial pyramid pooling (ASPP) modules are used to replace the common convolution layer through analogy with the UNet structure model. The shift window and multiple attention are used to achieve the integration of feature information inside the image slice and extract information complementarity between non-adjacent areas. At the same time, the ASPP structure can achieve self-attention of local information with an increasing receptive field. A Transformer structure is introduced to correlate information between different layers to prevent the loss of shallow layers of important information during downsampling convolution. The final architecture not only inherits advantages Transformer’s in learning global semantic associations, but also uses different levels of characteristics to preserve more semantics and more details in the model. As the input dataset of classification networks, binarized images obtained by the segmentation model can be used to identify different categories of breast cancer tumors. Based on ResNet50, this classification model adds multi-type attention modules and overfitting operations. squeeze-and-excitation(SE) and selective kernel(SK) attention can optimize network parameters, so that it only pays attention to the differences in segmentation regions improving the efficiency of the model. Thus proposed model by us achieved accurate segmentation of the lump on the breast cancer X-ray dataset INbreast, and we also compared it with five segmentation structures: UNet, UNet++, Res18_UNet, MultiRes_UNet, and Dense_UNet. After the segmentation model, a more accurate binary map of the cancer region was obtained. Problems, such as feature information blending of different levels and self-concern of the local information of the convolutional layer, exist in up-sampling and downsampling based on the UNet structure. Therefore, the Swin Transformer structure, which has a sliding window operation and hierarchical design, is adopted. Window Attention is shifted mainly by the Window Attention module and the Shifted window attention module, which enables the input feature graph to be sliced into multiple windows. The weight of each window is shifted in accordance with the shifted self-attention, and the position of the entire feature graph is shifted. It can realize the information interaction within the same feature graph. In upsampling and downsampling, we use four Swin Transformer structures. and in the process of fusion, we use the pyramid ASPP structure to replace the common feature graph channel addition operation, which can use multiple convolution check feature graphs and channel fusions, and the given input can be sampled in parallel with cavity convolution at different sampling rates. Achieve multiple scale capture image context information is obtained. In order to better integrate high- and low-dimensional spatial information, we propose a new multi-scale feature graph fusion strategy and use a Transformer with skip connections to enhance spatial domain information representation. Each cancer image was classified into normal, mass, deformation, and calcification according the introduction of the INbreast dataset. Each category was labeled and then sent to the classification network. The classification model we adopted takes ResNet50 as the baseline model. On this basis, two different kinds of attention, i.e., SE and SK, are added. SK convolution replaces 3 × 3 convolution at every bottleneck. Thus, more image features can be extracted at the convolutional layer. Meanwhile SE belongs to channel attention, and each channel can be weighted before the pixel value is outputted. Three methods, namely, Gaussian error gradient descent, label smoothing, and partial data enhancement, are introduced to improve the accuracy of the model.ResultIn the same parameter environment, the intersection over union(IoU) value reached 95.58%. Dice coefficient was 93.45%, which was 4%–6% higher than that of the other segmentation models. The binary segmentation image is classified into four categories, and the Accuracy reached 95.24%.ConclusionExperiments show that our proposed TransAS-UNet image segmentation method demonstrates good performance and clinical significance which is superior to those of other 2D image medical segmentation methods.  
      关键词:breast cancer;deep learning;medical image segmentation;TransAS-UNet;image classification   
      2
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216624 false
      发布时间:2024-03-13
    • Zhang Die,Huang Hui,Ma Yan,Huang Bingcang,Lu Weiping
      Vol. 29, Issue 3, Pages: 755-767(2024) DOI: 10.11834/jig.230338
      Prostate MR image segmentation network with edge information enhancement
      摘要:ObjectiveProstate cancer, which is an epithelial malignancy that occurs in the prostate, is one of the most common malignant diseases. Early detection of potentially cancerous prostate is important to reduce the prostate cancer mortality. Magnetic resonance imaging (MRI) is one of the most commonly used imaging methods for detecting prostate in clinical practice and commonly used for the detection, localization, and segmentation of prostate cancer. Formulating suitable medical plans for patients and postoperative record is important. In computer-aided diagnosis, extracting the prostate region from the image and further calculating the corresponding characteristics are often necessary for physiological analysis and pathological research to assist clinicians in making accurate judgments. The current methods of MRI prostate segmentation can be divided into two categories: traditional and deep-learning-based methods. The traditional segmentation method is based on the analysis of the features extracted from the image with knowledge of image processing. The effect of this kind of method depends on the performance of the extracted features, and this method sometimes requires manual interaction. In recent years, deep learning technology has been widely applied to image segmentation with the continuous development of computer technology. Unlike visible light images, medical images have special characteristics: large grayscale range, unclear boundaries, and the human organs have a relatively stable distribution in the human body. Considering these characteristics of medical images, a fully convolution-based U-Net was first proposed in 2015, which is a neural network model for solving the problem of medical image segmentation. Compared with other networks, U-Net has obvious advantages for medical image segmentation, but it still has some weaknesses that must be overcome. On the one hand, the dataset of medical images is not huge, but the traditional U-Net model has numerous parameters, which can easily lead to network overfitting. On the other hand, during feature extraction, the edge information of the image is lost. Furthermore, the small-scale information of the target object is difficult to save. The feature map obtained by U-Net’s skip connections usually contains noise, resulting in low model segmentation accuracy. To solve the above problems, this paper proposes an improved U-Net 2D prostate segmentation model, that is, AIM-U-Net, which can enhance the edge information between tissues and organs. AIM-U-Net can also reduce the influence of image noise, thereby improving the effect of prostate segmentation.MethodTo solve the overfitting problem, we redesign the encoder and decoder structure of the original U-Net, and the ordinary convolution is replaced with deep separable convolution. Deep separable convolution can effectively reduce the number of parameters in the network, thereby improving the computational efficiency, generalization ability, and accuracy of the model. In addition, we optimize the decoder features through the efficient channel attention module to amplify and retain information on small-scale targets. Moreover, edge information can provide fine-grained constraints to guide the feature extraction during segmentation. The features of shallow coding units retain sufficient edge information due to their high resolution, while the features extracted by the deep coding unit capture global feature information. Therefore, we designed the edge information module (EIM) to integrate the shallow features of the encoder and the high-level semantic information to obtain and enhance the edge information. Therefore, the obtained feature map has rich edge information and advanced semantic information. The EIM has two main functions. First, it can provide edge information and guide the segmentation process in the decoding path. Second, the edge detection loss of early convolutional layers is supervised by adopting a deep supervision mechanism. Moreover, the features extracted from different modules have their own advantage. The features of the deep coding unit can capture the global high-level discriminant feature information of the prostate, which is extremely helpful for the segmentation of small lesions. The multi-scale feature of the decoding unit has rich spatial semantic information, which can improve the accuracy of segmentation. The fusion information obtained by the EIM has rich edge information and advanced semantic information. Therefore, we design an edge information pyramid module (EIPM), which comprehensively uses the above different information by fusing the edge information, the deep features of the coding unit, and the multi-scale features of the decoding unit, so that the segmentation model can understand the image more comprehensively and improve the accuracy and robustness of segmentation. The EIPM can guide the segmentation process in the decoding path by fusing multi-scale information and can supervise the region segmentation loss of the decoder’s convolutional layer using the deep supervision mechanism. In the neural network segmentation task, the feature map obtained by feature fusion usually contains noise, decreasing the segmentation accuracy. To solve this problem, we use the atrous spatial pyramid pooling (ASPP) to process the enhanced edge feature map obtained by the EIPM, and the obtained multi-scale features are concatenated. ASPP resamples the fusion feature map through dilation convolution with different dilation rates, which can capture multi-scale context information, eliminate the noise of multi-scale features, and obtain a more accurate prostate representation. Hence, the segmentation result is obtained by 1 × 1 convolution with one output channels, whose dimension is the same as that of the input image. Finally, to accelerate the convergence speed of the network, we design a deep supervision mechanism to improve the convergence speed of the model and realize deep supervision mechanism through 1 × 1 convolution and activation function. Regarding the loss function of the whole model, we used a hybrid function of Dice loss and cross entropy loss. The total loss of the model includes the final segmentation loss, the edge segmentation loss, and the four region segmentation losses.ResultWe use the PROMISE12 dataset to verify the effectiveness of the model and compare the result with those of six other medical image segmentation methods based on U-Net. The experimental results show that the segmented images are remarkably improved in Dice coefficient (DC), 95% Hausdorff distance (HD95), recall, Jaccard coefficient (Jac), and accuracy. The DC is 8.87% higher than that of U-Net, and the HD95 value is 12.04 mm and 3.03 mm lower than those of U-Net++ and Attention U-Net, respectively.ConclusionThe edge of the segmented prostate is more refined using our proposed AIM-U-Net than that using other methods. AIM-U-Net can extract more edge details of the prostate by utilizing the EIM and the EIPM and effectively suppress similar background information and the noise surrounding the prostate.  
      关键词:medical image segmentation;prostate;magnetic resonance images(MRI);U-Net;edge information   
      2
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216102 false
      发布时间:2024-03-13
    • Zhang Yihan,Bai Zhengyao,You Yilin,Li Zekai
      Vol. 29, Issue 3, Pages: 768-781(2024) DOI: 10.11834/jig.230275
      Adaptive modal fusion dual encoder MRI brain tumor segmentation network
      摘要:ObjectiveAccurate segmentation of brain tumors is a challenging clinical diagnosis task, especially in assessing the degree of malignancy. The magnetic resonance imaging (MRI) of brain tumors exhibits various shapes and sizes, and the accurate segmentation of small tumors plays a crucial role in achieving accurate assessment results. However, due to the significant variability in the shape and size of brain tumors, their fuzzy boundaries make tumor segmentation a challenging task. In this paper, we propose a multi-modal MRI brain tumor image segmentation network, named D3D-Net, based on a dual encoder fusion architecture to improve the segmentation accuracy. The performance of the proposed network is evaluated on the BraTS2018 and BraTS2019 datasets.MethodThe paper proposes a network that utilizes multiple encoders and a feature fusion strategy. The network incorporates dual-layer encoders to thoroughly extract image features from various modal combinations, thereby enhancing the segmentation accuracy. In the encoding phase, a targeted fusion strategy is adopted to fully integrate the feature information from both upper and lower sub-encoders, effectively eliminating redundant features. Additionally, the encoding-decoding process employs an expanded multi-fiber module to capture multi-scale image features without incurring additional computational costs. Furthermore, an attention gate is introduced in the process to preserve fine-grained details. We conducted experiments on the BraTS2018, BraTS2019, and BraTS2020 datasets, including ablation and comparative experiments. We used the BraTS2018 training dataset, which consists of the magnetic resonance images of 210 high-grade glioma (HGG) and 75 low-grade glioma (LGG) patients. The validation dataset contains 66 cases. The BraTS2019 dataset added 49 HGG cases and 1 LGG case on top of the BraTS2018 dataset. Specifically, BraTS2018 is an open dataset that was released for the 2018 Brain Tumor Segmentation Challenge. The dataset contains multi-modal magnetic resonance images of HGG and LGG patients, including T1-weighted, T1-weighted contrast-enhanced, T2-weighted, and fluid-attenuated inversion recovery (FLAIR) image sequences. T1-weighted, T1-weighted contrast-enhanced, T2-weighted, and FLAIR images are all types of MRI sequences used to image the brain. T1-weighted MRI scans emphasize the contrast between different tissues on the basis of the relaxation time of the hydrogen atoms in the brain. In T1-weighted images, the cerebrospinal fluid appears dark, while the white matter appears bright. This type of scan is often used to detect structural abnormalities in the brain, such as tumors, and assess brain atrophy. T1-weighted contrast-enhanced MRI scans involve the injection of a contrast agent into the bloodstream to improve the visualization of certain types of brain lesions. This type of scan is particularly useful in detecting tumors because the contrast agent tends to accumulate in abnormal tissues. T2-weighted MRI scans emphasize the contrast between different tissues on the basis of the water content in the brain. In T2-weighted images, the cerebrospinal fluid appears bright, while the white matter appears dark. This type of scan is often used to detect areas of brain edema or inflammation. FLAIR MRI scans are similar to T2-weighted images but with the suppression of signals from the cerebrospinal fluid. This type of scan is particularly useful in detecting abnormalities in the brain that may be difficult to visualize with other types of scans, such as small areas of brain edema or lesions in the posterior fossa. The dataset is divided into two subsets: the training and validation datasets. The training dataset includes 285 cases, including 210 HGG and 75 LGG patients. The validation dataset includes 66 cases.ResultThe proposed D3D-Net exhibits superior performance compared with the baseline 3D U-Net and DMF-Net models. Specifically, on the BraTS2018 dataset, the D3D-Net achieves a high average Dice coefficient of 79.7%, 89.5%, and 83.3% for enhancing tumors, whole tumors, and tumor core segmentation, respectively. Result shows the effectiveness of the proposed network in accurately segmenting brain tumors of different sizes and shapes. The D3D-Net also demonstrated an improvement in segmentation accuracy compared with the 3D U-Net and DMF-Net models. In particular, compared with the 3D U-Net model, D3D-Net showed a significant improvement of 3.6%, 1.0%, and 11.5% in enhancing tumors, whole tumors, and tumor core segmentation, respectively. Additionally, compared with the DMF-Net model, D3D-Net respectively demonstrated an improvement of 2.2%, 0.2%, and 0.1% in the same segmentation tasks. On the BraTS2019 dataset, D3D-Net also achieved high accuracy in segmenting brain tumors. Specifically, the network achieved an average Dice coefficient of 89.6%, 91.4%, and 92.7% for enhancing tumors, whole tumors, and tumor core segmentation, respectively. The improvement in segmentation accuracy compared with the 3D U-Net model was 2.2%, 0.6%, and 7.1%, respectively, for enhancing tumors, whole tumors, and the tumor core segmentation. Results suggest that the proposed D3D-Net is an effective and accurate approach for segmenting brain tumors of different sizes and shapes. The network’s superior performance compared with the 3D U-Net and DMF-Net models indicates that the dual encoder fusion architecture, which fully integrates multi-modal features, is crucial for accurate segmentation. Moreover, the high accuracy achieved by D3D-Net in both the BraTS2018 and BraTS2019 datasets demonstrates the robustness of the proposed method and its potential to aid in the accurate assessment of brain tumors, ultimately improving clinical diagnosis. On the BraTS2020 dataset, the average Dice values for enhanced tumor, whole tumor, and tumor core increased by 2.5%, 1.9%, and 2.2%, respectively, compared with those on 3D U-Net.ConclusionThe proposed dual encoder fusion network, D3D-Net, demonstrates a promising performance in accurately segmenting brain tumors from MRI images. The network can improve the accuracy of brain tumor segmentation, aid in the accurate assessment of brain tumors, and ultimately improve clinical diagnosis. The proposed network has the potential to become a valuable tool for radiologists and medical practitioners in the field of neuro-oncology.  
      关键词:brain tumor segmentation;multimodal fusion;dual encoder;magnetic resonance imaging (MRI);attention gate   
      2
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216426 false
      发布时间:2024-03-13

      Image Understanding and Computer Vision

    • Xu Guangzhu,Wu Mengqi,Qian Yifan,Wang Yang,Liu Rong,Zhou Jun,Lei Bangjun
      Vol. 29, Issue 3, Pages: 782-797(2024) DOI: 10.11834/jig.230251
      Automatic capture for standard fetal cardiac four-chamber ultrasound view by fusing frame sequential relationships
      摘要:ObjectiveFirst-rank scan planes usually cannot be easily captured well because of the frequent pause and screenshot operations and fetal random movements when ultrasound sonographers manually scan a fetal heart region. This limitation discourages efficient screenings. When deep neural networks designed for visual object detection or classification are adapted for automatically capturing fetal cardiac ultrasound scan planes, these networks usually end up with a high false detection rate. One possible reason is that they cannot ensure focusing on the fine-grained features within the relatively small cardiac region. Moreover, optimal scanning moments for different cardiac parts are usually asynchronous, in which case object detection networks tend to miss numerous potential scan planes if they rely on counting coexisting cardiac parts at a moment. To solve the preceding problems, our study focuses on the most critical fetal cardiac ultrasound scan plane, namely four-chamber(4CH) scan plane, and proposes an automatic four-chamber scan plane extraction algorithm by simultaneously combining object detection and classification networks and considering the relationships of key video frames.MethodTo solve the problem emanating from the lack of public datasets of the four-chamber fetal echocardiographic image, 512 echocardiographic videos of 14- to 28-week-old fetuses were collected from our partners. Each video was recorded by experienced sonographers with mainstream ultrasound equipments. Most of these videos consist of continuous scan views from the gastric vesicle to the heart and to the three vessels thereafter. When labeling the standard four-chamber planes, to ensure that the detection model learns considerable information on the standard four-chamber scan plane, the standard four-chamber plane dataset used in subsequent experiments was manually screened from the image frames of video Nos. 1–100 and Nos. 144–512 to ensure each image has positive sample targets. In addition, the four-chamber heart region and the descending aorta(DAO) region in each image were labeled. Thereafter, these standard four-chamber scan planes were divided into training, verification, and test sets according to the ratio of 5∶2∶3. They were used for subsequent training and evaluation of the detection model on the standard four-chamber scan plane image set. During the training of the detection and classification models, the YOLOv5x network was first trained with the marked four-chamber scan plane image dataset. Thereafter, the trained detection model was used to evaluate the previously unmarked video frames (regarded as non-standard four-chamber planes) under the appropriate threshold setting. The false detected images were extracted as the negative dataset for the following classification model’s training. Lastly, the four-chamber regions were extracted to train the Darkent53 classification model according to the position coordinates of manually labeled (as standard) and mistakenly detected by YOLOv5x (as non-standard) four-chamber regions. During the reasoning process, the trained detection model was first used to achieve rapid and accurate locating of the four-chamber and descending aorta regions. Thereafter, when a descending aorta region was detected in a video frame within a certain time window, the candidate regions containing the four-chamber objects were extracted and sent to the classification model, which is well-trained with the self-built qualified four-chamber region dataset to further classify the qualified four-chamber regions. Lastly, the reliable descending aorta region was determined through the time series relationship. The score of a standard four-chamber scan plane was calculated by a weighted sum of the detection confidence of the reliable descending aorta and the quality metrics of the four-chamber regions of those frames in the same time window.ResultGiven that there are several standard four-chamber scan planes in any fetal cardiac ultrasound video and that this research mainly studies the optimal automatic extraction of the standard four-chamber scan planes, we focus considerably on the false detection rate when analyzing the performance of the YOLOv5x (for detection) and Darknet53 (for classification) modules before and after their combination. The objective is to achieve a relatively low false detection rate while ensuring low false detection rate. Experimental results show that with the detection confidence threshold increasing (0.3–0.9), the false detection rate of YOLOv5x gradually decreases (from 36.25% to 11.20%), but the missed detection rate continuously increases (from 0.31% to 27.17%). This result indicates the difficulty of ensuring a low false detection rate by merely adjusting the detection confidence of YOLOv5x. With the confidence threshold increasing, the missed detection rate of YOLOv5x is also increasing. Therefore, determining whether there is a standard four-chamber heart region in each frame is not possible by simply adjusting the detection confidence threshold of YOLOv5x. When the detection confidence threshold is set to 0.3 and the Darknet53 classification module is added, although the system’s missed detection rate increases by 19.72%, the false detection rate decreases by 35.18%. When the detection confidence threshold is 0.4–0.6, after the Darknet53 classification module is combined, although there are still a few missed detections for the entire system, its false detection rate is significantly reduced compared with the case when only the YOLOv5x detection module is used. Moreover, when the confidence threshold is 0.5, the overall system’s error rate reaches the lowest level, with an error rate of 21.06%, and the false detection rate decreases from 30.25% to 0.87% (a decrease of 29.38%). When the detection confidence level is 0.7–0.9, combining the Darknet53 classification module can further reduce the false detection rate of the system, but the missed detection rate will increase with the increase of confidence level (from 20.96% to 40.22%). Therefore, to ensure a low false detection rate and obtain a low missed rate, the setting of the confidence threshold as 0.5 and intersection ratio as 0.5 are adopted in this study. Although the experimental data show that the missed rate is nearly 21% in the best situation, the false detection rate is the key index for the practical problems faced in this research. Through the proposed algorithm, the false detection rate can be reduced to under 1%. In real application scenarios, the effective four-chamber video frame will often appear multiple times; in this case, the low false detection rate and high missed rate can meet the actual needs.ConclusionExperimental results show that the combination of the target detection and classification networks, combined with the inter-frame sequential information, can effectively reconcile the contradiction between error detection and missed detection and significantly reduce false detection rate. Lastly, the proposed algorithm can automatically extract the standard four-chamber plan and also recommend the best one, which has good practical application value.  
      关键词:deep learning;convolutional neural network(CNN);object detection;image classification;frame sequential relationships   
      2
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216428 false
      发布时间:2024-03-13
    • Lu Senliang,Feng Bao,Xu Kuncai,Chen Yehang,Chen Xiangmeng
      Vol. 29, Issue 3, Pages: 798-810(2024) DOI: 10.11834/jig.230160
      Personalized federated medical image classification with adaptive transfer robust features
      摘要:ObjectivePatient data cannot be shared among medical institutions due to medical data confidentiality regulations, considerably limiting data scale. Federated learning ensures that all clients can train local models and aggregate global models in a decentralized manner without sharing data. However, the heterogeneity of medical data substantially affects the aggregation and deployment of global models in federated learning. In most federated learning methods, the aggregation of global model parameters is achieved by multiplying the fixed weight with the local model parameters and then summing them. The local model personalization method requires a large number of manual experiments to select the appropriate model layer for personalization construction. Although these methods can realize the aggregation of global models or the construction of personalized local models, they cannot automatically aggregate global model parameters and construct personalized local models. Moreover, they lack pertinence to heterogeneity characteristics. Therefore, an adaptive personalized federated learning algorithm via feature transfer (APFFT) is proposed. This algorithm can automatically identify and select robust features for personalized local model construction and global model aggregation. It can also suppress and filter heterogeneous feature information.MethodTo construct a personalized local model, a robust feature selection network (RFS-Net) was proposed in this study. RFS-Net can automatically identify and select features by calculating transfer weights and the amount of feature transfer on the basis of model representation. When transferring features from a global model to a local model, RFS-Net constructs transfer loss functions on the basis of transfer weights and the amount of feature transfer to constrain the local model and strengthen its attention toward effective transfer features. In the aggregation of the global model, the adaptive aggregation network (AA-Net) was proposed to transfer features from the local model to the global model. AA-Net updated the transfer weight and constructed the aggregation loss on the basis of the cross-entropy change of the global model for filtering the heterogeneity feature information of each local model. In this study, PyTorch was used to build and train the models, while ResNet18 was used for the convolutional neural network (CNN) structure. RFS-Net and AA-Net were composed of fully connected, pooling, softmax, and ReLU6 layers. The parameters of RFS-Net, AA-Net, and the CNN were updated via stochastic gradient descent with a momentum of 0.9. Experiments were conducted on three medical image datasets: the nonpublic dataset of pulmonary adenocarcinoma and tuberculosis classification, the public dataset Camelyon17, and the public dataset LIDC. The dataset of pulmonary adenocarcinoma tuberculosis classification came from 5 hospitals, with 1 009 cases. Among which, Center 1 (training set n = 260, test setn = 242), Center 2 (training set n = 34, test set n = 54), Center 3 (training set n = 39, test set n = 40), Center 4 (training set n = 145, test set n = 108), and Center 5 (training set n = 36, test set n = 51) were used in the experiment. The learning rate and decay rate of RFS-Net and AA-Net were both 0.000 1, while the learning rate and decay rate of the CNN were 0.001 and 0.000 5, respectively. Focal loss was used to calculate cross-entropy. In addition, gender, age, and nodule size in clinical information are of considerable reference value in the diagnosis of tuberculosis and lung adenocarcinoma. Therefore, we provided statistics for this information, and the results showed that in Center 2, the overall age and nodule size were small, while in Center 4, the overall nodule size was large, exhibiting a certain gap with the global average level. Camelyon17 was composed of 450 000 histological images from 5 hospitals. In the experiment, the learning rate and decay rate of the CNN, RFS-Net, and AA-Net were all 0.000 1. Standard cross-entropy was used to constrain CNN training. LIDC data came from 7 research institutions and 8 medical image companies, with 1 018 cases. Lesions with Grades 1 to 2 malignancies were classified as benign, while those with Grades 4 to 5 malignancies were classified as malignant. Finally, 1 746 lesions were included in the dataset to simulate the federated learning application scenario. The lesions were then randomly divided into 4 centers in accordance with the cases. Center 1 (training set n = 254, test set n = 169), Center 2 (training set n = 263, test set n = 190), Center 3 (training set n = 305, test set n = 124), and Center 4 (training set n = 247, test set n = 194) were used in the experiment. The learning rate and decay rate of RFS-Net and AA-Net were both 0.000 1. The learning rate and decay rate of the CNN were 0.001 and 0.000 1, respectively. The cross-entropy loss was calculated using standard cross-entropy.ResultThree types of medical image classification tasks were compared with four existing methods. The evaluation indexes included receiver operating characteristic (ROC) and accuracy. The experimental results showed that in the tuberculosis lung adenocarcinoma classification task, the center test sets of the end-to-end area under the ROC curve (AUC) were 0.791 5, 0.798 1, 0.76, 0.705 7, and 0.806 9. In the breast cancer histological image classification task, the center test sets of end-to-end accuracy were 0.984 9, 0.980 8, 0.983 5, 0.982 6, and 0.983 4. In the pulmonary nodule benign and malignancy classification task, the center test sets of the end-to-end AUC were 0.809 7, 0.849 8, 0.784 8, and 0.792 3.ConclusionThe federated learning method proposed in this study can reduce the influence of heterogeneous characteristics and realize the adaptive construction of personalized local models and the adaptive aggregation of global models. The results show that our model is superior to several existing federated learning methods, and model performance is considerably improved.  
      关键词:feature transfer;federated learning;heterogeneity features;robust feature selection network;adaptive aggregation network;medical image classification   
      2
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216586 false
      发布时间:2024-03-13
    • Lu Hao,Chen Jinling,Chen Jie,Chen Baihe,Tang Zhuowei
      Vol. 29, Issue 3, Pages: 811-822(2024) DOI: 10.11834/jig.230353
      Double-tier multiple instance learning model for histopathology image classification
      摘要:ObjectiveWhole slide images (WSIs), which refer to scanning and converting a complete microscope slide to digital WSIs, is an efficient technique for visualizing tissue sections in disease diagnosis, medical education, and pathological research. Analysis of histopathology WSIs is the gold standard for pathology diagnosis. However, analyzing pathological WSIs is a tedious and time-consuming task, and the diagnosis result is easily influenced by personal experience. The increasing use of WSIs in histopathology results in digital pathology providing huge improvements in pathologists’ workflow and diagnosis decision-making, but it also stimulates the need for computer-aided diagnostic tools of WSIs. At present, a significant number of experts and scholars have begun exploring the application of deep learning in the field of pathological image analysis. WSIs possess gigapixel resolution and usually lack pixel-level annotations. Existing deep learning techniques are developed for small-sized conventional images. Therefore, applying these techniques directly to WSI analysis is not feasible. Weakly supervised multiple instance learning (MIL) is a powerful method in analyzing WSIs, and the key component is how to effectively discover the crucial instance that triggers the prediction from massive instances and summarize valuable information from different instances. Previous methods were primarily designed based on the independent and identical distribution(i.i.d.) hypothesis, disregarding the relationships among different instances and the heterogeneity of tumors. To solve these problems, a novel double-tier MIL (DT-MIL) model is proposed.MethodThe proposed method consists of three aspects: 1) pre-processing operation of WSIs, 2) convolutional neural network (CNN)-based feature encoding, and 3) feature fusion of instance embeddings. First, WSIs are cropped into fixed-sized image patches using a sliding window strategy, filtering out invalid background regions and retaining only the foreground areas containing pathological tissues. Second, the CNN-based feature encoder encodes the image patches into fixed-length feature embeddings. Lastly, the proposed DT-MIL model is deployed in the feature fusion part. DT-MIL contains two MIL models in series. The Tier-1 MIL model is applied to generate negative and positive internal queries, also known as the adaptive feature miner. The Tier-2 MIL model consists of deep non-linear and double-detection cross-attention modules. The former maps the instance features in the bag, while the latter is applied to generate a bag-level representation for final classification. In particular, Tier-1’s adaptive feature miner applies the idea of Grad-CAM to provide a reliable probability distribution of instances under the AB-MIL framework. Thereafter, highly reliable features are retrieved and aggregated to generate internal query for each subclass. Moreover, adaptive feature miner flexibly selects K discriminative instances to generate reliable internal query to mitigate the constraints of tumor heterogeneity on model performance and avoid introducing false information. In addition, adaptive feature miner considers positive and negative instances to prevent biased decision boundary. Tier-2 aims to produce a robust bag-level representation for subsequent classifiers by simultaneously modeling the relationship among positive query, negative query, and instances in the bag. Aggregating all instances from the bag by establishing the connections among positive query, negative query, and each instance simultaneously can supplement the feature information and also enable the model to remain sensitive to positive and negative instances. Consequently, the model is prevented from being biased against negative instances, and its robustness is improved. An in-domain feature encoder pre-trained by the self-supervised comparative learning framework SimCLR is also introduced into the proposed model to generate more robust feature embeddings.ResultThis study performs a comparison and ablation-related experiments on two publicly available datasets, namely, CAMELYON-16 and TCGA lung cancer. First, we compared six classical multi-instance learning models. Experimental results show that the proposed model performs optimally and achieves significant improvements in accuracy, precision, and recall. In the CAMELYON-16 dataset, testing accuracy, precision, and recall for binary tumor classification reached 95.35%, 95.91%, and 94.27%, respectively. In the TCGA lung cancer dataset, testing accuracy, precision, and recall for cancer subtype classification achieved 91.87%, 91.92%, and 91.83%, respectively. The proposed method achieved accuracy rates 2.33% and 0.96% higher than the state-of-the-art methods in the CAMELYON-16 and TCGA lung cancer datasets, respectively. Second, we conducted ablation experiments on the proposed model to verify the effectiveness of its key components. Experimental results show that sequentially adding the feature extractor, adaptive feature miner, and dual-path cross-detection module helped improve the accuracy of the model by 31.78%, 3.1%, and 0.78%, respectively. Lastly, we compared the proposed adaptive feature miner with traditional K-means clustering and aggregate Top-K instances. Experimental results indicate that the adaptive feature miner can flexibly extract discriminative features, thereby generating optimal internal query.ConclusionThe proposed DT-MIL model sinuously considers correlation between instances and the tumor heterogeneity. It can better mine the internal feature information of histopathological images and significantly improve the detection accuracy. This result demonstrates the effectiveness of the proposed model in pathological diagnosis and accurately locating the lesion region. These aspects have high application value in pathology-assisted diagnostic scenarios.  
      关键词:multiple instance learning(MIL);histopathological image;self-supervised comparative learning;weakly supervised learning;deep learning   
      2
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216099 false
      发布时间:2024-03-13
    • Hou Yingsa,Sun Zheng,Sun Meichen
      Vol. 29, Issue 3, Pages: 823-838(2024) DOI: 10.11834/jig.221197
      JIR-Net: a joint iterative reconstruction network forphotoacoustic tomography image reconstruction
      摘要:ObjectivePhotoacoustic tomography (PAT) is a hybrid functional imaging modality developed rapidly in recent years. PAT is physically based on the photoacoustic effect, where biological tissues are irradiated by short laser pulses, inducing broadband (~MHz) ultrasonic waves (i.e., photoacoustic waves) due to optical absorption and thermoelastic expansion. Ultrasonic transducers deployed around the imaging target collect photoacoustic waves from which images are reconstructed to show the morphological structure and functional properties of tissues. High-quality image reconstruction is essential for PAT, which suffers from incomplete measurements and heterogeneous acoustic properties of tissues. Traditional image reconstruction methods include back projection, time reversal, Fourier transform-based reconstruction, and delay and sum. For simplicity, these methods are usually based on ideal assumptions about the imaging scenario, such as fixed speed of sound, a lossless acoustic media without attenuation, a point-like ultrasonic detector with sufficient bandwidth, and complete measurement. However, in real-world applications, these ideal scenarios often do not occur, leading to the degradation of the quality of images reconstructed using these methods. The model-based iterative reconstruction scheme is commonly used to improve image quality, where the inversion of a forward imaging model describing the generation of photoacoustic signal is iteratively solved. However, its real-time applications are limited by its high computational cost because the forward imaging operator and its adjoint operator need to be calculated repeatedly in the iterative process. Regularization tools with properly defined parameters are necessary to obtain stable optimization. In addition, the reconstruction quality highly depends on the prior assumptions of the imaging object. In recent years, deep learning has shown great potential in reconstructing high-quality images from photoacoustic measurements. This work aims to solve the problem of image quality degradation caused by incomplete measurement and heterogeneously distributed speed of sound.MethodA deep learning method is proposed to reconstruct jointly images representing the distributions of optical absorption and speed of sound within the imaging domain from incomplete photoacoustic measurements. A convolutional neural network, named joint iterative reconstruction network (JIR-Net), is constructed based on an iterative learning strategy. Incomplete photoacoustic measurements are fed into the network, and images representing absorbed optical energy density and speed of sound distributions are output. The network consists of four structural units, and each unit is composed of three modules: feature extraction, feature fusion, and reconstruction. The feature extraction module extracts features from four inputs via convolution. The feature fusion module combines the features extracted from the input. Finally, the reconstruction module recreates the distributions of absorbed optical energy density and speed of sound. The network is trained using simulation, phantom, and in vivo datasets, where the gradient descent information of the absorbed optical energy density and the speed of sound distributions is incorporated into the network training. The nonlinear least square problem is solved by using the back propagation gradient descent. The validity of the method is demonstrated by simulation, phantom, and in vivo studies. Compared with traditional nonlearning methods, non-iterative learning method, and learning iterative method based on depth gradient descent, JIR-Net is superior in reconstructing high-quality images from sparse data measured in acoustically heterogeneous media.ResultNumerical simulation, phantom, and in vivo experiment results show the trained JIR-Net is robust to data sparsity and insensitive to the initial iterative plan. Moreover, the superiority of JIR-Net in complex structure reconstruction is proven in vivo. Compared with the depth gradient descent method, the U-Net post-processing method, and the alternate optimization method, the structural similarity of the reconstructed images representing absorbed optical energy density distribution can be improved by 7.6%, 26.4%, and 39.5%, respectively, and the peak signal-to-noise ratio can be improved by 15.5%, 71.4%, and 95.6%, respectively. Compared with the alternate optimization method, the structural similarity and peak signal-to-noise ratio of the JIR-Net reconstructed speed of sound images are increased by 34.4% and 22.6%, respectively.ConclusionJIR-Net achieves the mapping from incomplete photoacoustic measurements to high-quality images representing the distributions of absorbed optical energy density and speed of sound. The method can be used for image reconstruction from limited-view sparse photoacoustic signals collected with any measuring geometry in acoustic media with inhomogeneously distributed speed of sound. The method eliminates the need for prior knowledge of the characteristics of the imaging target and reduces the need for scanning and detection equipment, enabling constructing more compact imaging systems.  
      关键词:image reconstruction techniques;photoacoustic tomography(PAT);deep learning;absorbed optical energy density;speed of sound(SoS);joint reconstruction;gradient descent   
      2
      |
      0
      |
      0
      <HTML>
      <L-PDF><WORD><Meta-XML>
      <引用本文> <批量引用> 50216755 false
      发布时间:2024-03-13
    0