用于组织病理图像分类的双层多实例学习模型

陆浩; 陈金令; 陈杰; 陈百合; 唐卓葳

doi:10.11834/jig.230353

图像理解和计算机视觉 | 浏览量 : 0 下载量: 8 CSCD: 0

PDF
导出
分享
收藏
专辑

用于组织病理图像分类的双层多实例学习模型
Double-tier multiple instance learning model for histopathology image classification
2024年29卷第3期页码：811-822
纸质出版日期： 2024-03-16 ，
DOI： 10.11834/jig.230353
稿件说明：

移动端阅览

陆浩，陈金令，陈杰，陈百合，唐卓葳. 2024. 用于组织病理图像分类的双层多实例学习模型. 中国图象图形学报， 29(03):0811-0822

Lu Hao， Chen Jinling， Chen Jie， Chen Baihe， Tang Zhuowei. 2024. Double-tier multiple instance learning model for histopathology image classification. Journal of Image and Graphics， 29(03):0811-0822
陆浩，陈金令，陈杰，陈百合，唐卓葳. 2024. 用于组织病理图像分类的双层多实例学习模型. 中国图象图形学报， 29(03):0811-0822 DOI： 10.11834/jig.230353.

Lu Hao， Chen Jinling， Chen Jie， Chen Baihe， Tang Zhuowei. 2024. Double-tier multiple instance learning model for histopathology image classification. Journal of Image and Graphics， 29(03):0811-0822 DOI： 10.11834/jig.230353.

摘要

目的

分析组织病理学全玻片图像（whole slide images，WSIs）是病理学诊断的金标准。WSIs具有千兆像素，且通常缺乏像素级标注。弱监督多实例学习是分析WSIs的主流方法，其关键是怎样从大量实例中精确识别出触发类别预测的关键实例。以前的WSIs分析方法主要是在独立同分布假设下设计的，忽略了实例间的相关性和肿瘤的异质性。针对上述问题，提出一种新的双层多实例学习模型。

方法

具体地，提出的模型由自适应特征挖掘器和双路交叉检测模块级联构成。首先，第1层的自适应特征挖掘器检索包中的区分性特征，为后续的实例特征聚合生成可靠的内部查询；然后，第2层的双路交叉检测模块通过建模内部查询与实例间的相关性，聚合包中所有实例生成最终的包级表示。此外，在特征提取部分中引入了自监督对比学习方法SimCLR以生成高质量的实例特征。

结果

在两个公共可用的数据集CAMELYON-16和TCGA（the cancer genome atlas）肺癌上评估了提出的模型，对比分析6种经典的多实例学习模型，结果显示本文模型的性能最优。在准确率方面，所提方法在CAMELYON-16和TCGA肺癌两个数据集上分别达到了95.35%和91.87%，较对比方法中最优的分别高出2.33%和0.96%。

结论

提出的模型可以较好地挖掘组织病理学图像的内部特征信息，显著提升检测精度，表明其在病理学诊断应用中的有效性，并能够准确定位病变区域，在病理辅助诊断场景下有较高的应用价值。

Abstract

Objective

Whole slide images （WSIs）， which refer to scanning and converting a complete microscope slide to digital WSIs， is an efficient technique for visualizing tissue sections in disease diagnosis， medical education， and pathological research. Analysis of histopathology WSIs is the gold standard for pathology diagnosis. However， analyzing pathological WSIs is a tedious and time-consuming task， and the diagnosis result is easily influenced by personal experience. The increasing use of WSIs in histopathology results in digital pathology providing huge improvements in pathologists’ workflow and diagnosis decision-making， but it also stimulates the need for computer-aided diagnostic tools of WSIs. At present， a significant number of experts and scholars have begun exploring the application of deep learning in the field of pathological image analysis. WSIs possess gigapixel resolution and usually lack pixel-level annotations. Existing deep learning techniques are developed for small-sized conventional images. Therefore， applying these techniques directly to WSI analysis is not feasible. Weakly supervised multiple instance learning （MIL） is a powerful method in analyzing WSIs， and the key component is how to effectively discover the crucial instance that triggers the prediction from massive instances and summarize valuable information from different instances. Previous methods were primarily designed based on the independent and identical distribution（i.i.d.） hypothesis， disregarding the relationships among different instances and the heterogeneity of tumors. To solve these problems， a novel double-tier MIL （DT-MIL） model is proposed.

Method

The proposed method consists of three aspects： 1） pre-processing operation of WSIs， 2） convolutional neural network （CNN）-based feature encoding， and 3） feature fusion of instance embeddings. First， WSIs are cropped into fixed-sized image patches using a sliding window strategy， filtering out invalid background regions and retaining only the foreground areas containing pathological tissues. Second， the CNN-based feature encoder encodes the image patches into fixed-length feature embeddings. Lastly， the proposed DT-MIL model is deployed in the feature fusion part. DT-MIL contains two MIL models in series. The Tier-1 MIL model is applied to generate negative and positive internal queries， also known as the adaptive feature miner. The Tier-2 MIL model consists of deep non-linear and double-detection cross-attention modules. The former maps the instance features in the bag， while the latter is applied to generate a bag-level representation for final classification. In particular， Tier-1’s adaptive feature miner applies the idea of Grad-CAM to provide a reliable probability distribution of instances under the AB-MIL framework. Thereafter， highly reliable features are retrieved and aggregated to generate internal query for each subclass. Moreover， adaptive feature miner flexibly selects K discriminative instances to generate reliable internal query to mitigate the constraints of tumor heterogeneity on model performance and avoid introducing false information. In addition， adaptive feature miner considers positive and negative instances to prevent biased decision boundary. Tier-2 aims to produce a robust bag-level representation for subsequent classifiers by simultaneously modeling the relationship among positive query， negative query， and instances in the bag. Aggregating all instances from the bag by establishing the connections among positive query， negative query， and each instance simultaneously can supplement the feature information and also enable the model to remain sensitive to positive and negative instances. Consequently， the model is prevented from being biased against negative instances， and its robustness is improved. An in-domain feature encoder pre-trained by the self-supervised comparative learning framework SimCLR is also introduced into the proposed model to generate more robust feature embeddings.

Result

This study performs a comparison and ablation-related experiments on two publicly available datasets， namely， CAMELYON-16 and TCGA lung cancer. First， we compared six classical multi-instance learning models. Experimental results show that the proposed model performs optimally and achieves significant improvements in accuracy， precision， and recall. In the CAMELYON-16 dataset， testing accuracy， precision， and recall for binary tumor classification reached 95.35%， 95.91%， and 94.27%， respectively. In the TCGA lung cancer dataset， testing accuracy， precision， and recall for cancer subtype classification achieved 91.87%， 91.92%， and 91.83%， respectively. The proposed method achieved accuracy rates 2.33% and 0.96% higher than the state-of-the-art methods in the CAMELYON-16 and TCGA lung cancer datasets， respectively. Second， we conducted ablation experiments on the proposed model to verify the effectiveness of its key components. Experimental results show that sequentially adding the feature extractor， adaptive feature miner， and dual-path cross-detection module helped improve the accuracy of the model by 31.78%， 3.1%， and 0.78%， respectively. Lastly， we compared the proposed adaptive feature miner with traditional K-means clustering and aggregate Top-K instances. Experimental results indicate that the adaptive feature miner can flexibly extract discriminative features， thereby generating optimal internal query.

Conclusion

The proposed DT-MIL model sinuously considers correlation between instances and the tumor heterogeneity. It can better mine the internal feature information of histopathological images and significantly improve the detection accuracy. This result demonstrates the effectiveness of the proposed model in pathological diagnosis and accurately locating the lesion region. These aspects have high application value in pathology-assisted diagnostic scenarios.

关键词

多实例学习（MIL）组织病理学图像自监督对比学习弱监督学习深度学习

Keywords

multiple instance learning（MIL）histopathological imageself-supervised comparative learningweakly supervised learningdeep learning

references

Campanella G， Hanna M G， Geneslaw L， Miraflor A， Werneck Krauss Silva V， Busam K J， Brogi E， Reuter V E， Klimstra D S and Fuchs T J. 2019. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature Medicine， 25（8）： 1301-1309 ［DOI： 10.1038/s41591-019-0508-1http://dx.doi.org/10.1038/s41591-019-0508-1］

Chen J L， Li J， Zhao C M and Liu X. 2022. Research of breast pathological subtype classification on WSI. Application Research of Computers， 39（10）： 3167-3173

陈金令，李洁，赵成明，刘鑫. 2022. 面向WSI的乳腺病理亚型分类研究. 计算机应用研究， 39（10）： 3167-3173 ［DOI： 10.19734/j.issn.1001-3695.2022.03.0087http://dx.doi.org/10.19734/j.issn.1001-3695.2022.03.0087］

Chen T， Kornblith S， Norouzi M and Hinton G. 2020. A simple framework for contrastive learning of visual representations//Proceedings of the 37th International Conference on Machine Learning. Vienna， Austria： PMLR： 1597–1607

Dehaene O， Camara A， Moindrot O， de Lavergne A and Courtiol P. 2020. Self-supervision closes the gap between weak and strong supervision in histology ［EB/OL］. ［2023-05-20］. https://arxiv.org/pdf/2012.03583.pdfhttps://arxiv.org/pdf/2012.03583.pdf

Feng J and Zhou Z H. 2017. Deep MIML network//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco， USA： AAAI： 1884-1890 ［DOI： 10.1609/aaai.v31i1.10890http://dx.doi.org/10.1609/aaai.v31i1.10890］

Gao H M， Zhu M， Cao X Y， Li C M， Liu Q and Xu P P. 2023. A micro-hyperspectral image classification method of gallbladder cancer based on multi-scale fusion attention mechanism. Journal of Image and Graphics， 28（4）： 1173-1185

高红民，朱敏，曹雪莹，李臣明，刘芹，许佩佩. 2023. 多尺度融合注意力机制的胆囊癌显微高光谱图像分类. 中国图象图形学报， 28（4）： 1173-1185 ［DOI： 10.11834/jig.211201http://dx.doi.org/10.11834/jig.211201］

Gao Y H， Zhou M and Metaxas D N. 2021. UTNet： a hybrid Transformer architecture for medical image segmentation//Proceedings of the 24th International Conference on Medical Image Computing and Computer-Assisted Intervention. Strasbourg， France： Springer： 61-71 ［DOI： 10.1007/978-3-030-87199-4_6http://dx.doi.org/10.1007/978-3-030-87199-4_6］

He K M， Fan H Q， Wu Y X， Xie S N and Girshick R. 2020. Momentum contrast for unsupervised visual representation learning//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 9726-9735 ［DOI： 10.1109/CVPR42600.2020.00975http://dx.doi.org/10.1109/CVPR42600.2020.00975］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Ilse M， Tomczak J and Welling M. 2018. Attention-based deep multiple instance learning//Proceedings of the 35th International Conference on Machine Learning. Stockholm， Sweden： PMLR： 2127-2136

Jia K X， Ma Z H， Zhu R and Li Y G. 2022. Attention-mechanism-based light single shot multiBox detector modelling improvement for small object detection on the sea surface. Journal of Image and Graphics， 27（4）： 1161-1175

贾可心，马正华，朱蓉，李永刚. 2022. 注意力机制改进轻量SSD模型的海面小目标检测. 中国图象图形学报， 27（4）： 1161-1175 ［DOI： 10.11834/jig.200517http://dx.doi.org/10.11834/jig.200517］

Kanavati F， Toyokawa G， Momosaki S， Rambeau M， Kozuma Y， Shoji F， Yamazaki K， Takeo S， Iizuka O and Tsuneki M. 2020. Weakly-supervised learning for lung carcinoma classification using deep learning. Scientific Reports， 10（1）： #9297 ［DOI： 10.1038/s41598-020-66333-xhttp://dx.doi.org/10.1038/s41598-020-66333-x］

Li B， Li Y and Eliceiri K W. 2021. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 14313-14323 ［DOI： 10.1109/CVPR46437.2021.01409http://dx.doi.org/10.1109/CVPR46437.2021.01409］

Li H L， Zhu C L， Zhang Y L， Sun Y X， Shui Z Y， Kuang W W， Zheng S Y and Yang L. 2023. Task-specific fine-tuning via variational information bottleneck for weakly-supervised pathology whole slide image classification//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver， Canada： IEEE： 7454-7463 ［DOI： 10.1109/CVPR52729.2023.00720http://dx.doi.org/10.1109/CVPR52729.2023.00720］

Lu M Y， Williamson D F K， Chen T Y， Chen R J， Barbieri M and Mahmood F. 2021. Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering， 5（6）： 555-570 ［DOI： 10.1038/s41551-020-00682-whttp://dx.doi.org/10.1038/s41551-020-00682-w］

Selvaraju R R， Cogswell M， Das A， Vedantam R， Parikh D and Batra D. 2017. Grad-CAM： visual explanations from deep networks via gradient-based localization//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： ICCV： 618-626 ［DOI： 10.1109/ICCV.2017.74http://dx.doi.org/10.1109/ICCV.2017.74］

Shao Z C， Bian H， Chen Y， Wang Y F， Zhang J， Ji X Y and Zhang Y B. 2021. TransMIL： Transformer based correlated multiple instance learning for whole slide image classification//Proceedings of the 35th International Conference on Neural Information Processing Systems. ［s.l.］：［s.n.］： 2136-2147

Sharma Y， Shrivastava A， Ehsan L， Moskaluk C A， Syed S and Brown D E. 2021. Cluster-to-conquer： a framework for end-to-end multi-instance learning for whole slide image classification//Proceedings of the Medical Imaging with Deep Learning. Lübeck， Germany： PMLR： 682-698

Srinidhi C L， Ciga O and Martel A L. 2021. Deep neural network models for computational histopathology： a survey. Medical Image Analysis， 67： #101813 ［DOI： 10.1016/j.media.2020.101813http://dx.doi.org/10.1016/j.media.2020.101813］

Sung H， Ferlay J， Siegel R L， Laversanne M， Soerjomataram I， Jemal A and Bray F. 2021. Global cancer statistics 2020： GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA： A Cancer Journal for Clinicians， 71（3）： 209-249 ［DOI： 10.3322/caac.21660http://dx.doi.org/10.3322/caac.21660］

Tellez D， Litjens G， van der Laak J and Ciompi F. 2021. Neural image compression for gigapixel histopathology image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence， 43（2）： 567-578 ［DOI： 10.1109/TPAMI.2019.2936841http://dx.doi.org/10.1109/TPAMI.2019.2936841］

Vaswani A， Shazeer N， Parmar N， Uszkoreit J， Jones L， Gomez A N， Kaiser L and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates： 6000-6010

Wang X G， Yan Y L， Tang P， Bai X and Liu W Y. 2018. Revisiting multiple instance neural networks. Pattern Recognition， 74： 15-24 ［DOI： 10.1016/j.patcog.2017.08.026http://dx.doi.org/10.1016/j.patcog.2017.08.026］

Wang Z H， Yu L Q， Ding X， Liao X H and Wang L S. 2022a. Lymph node metastasis prediction from whole slide images with Transformer-guided multiinstance learning and knowledge transfer. IEEE Transactions on Medical Imaging， 41（10）： 2777-2787 ［DOI： 10.1109/TMI.2022.3171418http://dx.doi.org/10.1109/TMI.2022.3171418］

Wang Z K， Bi Y， Pan T， Wang X Y， Bain C， Bassed R， Imoto S， Yao J H and Song J N. 2022b. Multiplex-detection based multiple instance learning network for whole slide image classification ［EB/OL］. ［2023-05-20］. https://arxiv.org/pdf/2208.03526.pdfhttps://arxiv.org/pdf/2208.03526.pdf

Xu G， Song Z G， Sun Z， Ku C， Yang Z， Liu C C， Wang S H， Ma J P and Xu W. 2019. CAMEL： a weakly supervised learning framework for histopathology image segmentation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 10681-10690 ［DOI： 10.1109/ICCV.2019.01078http://dx.doi.org/10.1109/ICCV.2019.01078］

Zeng Y H， Fu J L and Chao H Y. 2020. Learning joint spatial-temporal transformations for video inpainting//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 528-543 ［DOI： 10.1007/978-3-030-58517-4_31http://dx.doi.org/10.1007/978-3-030-58517-4_31］

Zhang H R， Meng Y D， Qian X S， Yang X Y， Coupland S E and Zheng Y L. 2021. A regularization term for slide correlation reduction in whole slide image analysis with deep learning//Proceedings of the 4th Conference on Medical Imaging with Deep Learning. Lübeck， Germany： PMLR： 143： 842-854

Zhang H R， Meng Y D， Zhao Y T， Qiao Y H， Yang X Y， Coupland S E and Zheng Y L. 2022. DTFD-MIL： double-tier feature distillation multiple instance learning for histopathology whole slide image classification//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 18780-18790 ［DOI： 10.1109/CVPR52688.2022.01824http://dx.doi.org/10.1109/CVPR52688.2022.01824］

Zhang X N， Wang T T， Qi J Q， Lu H C and Wang G. 2018. Progressive attention guided recurrent network for salient object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 714-722 ［DOI： 10.1109/CVPR.2018.00081http://dx.doi.org/10.1109/CVPR.2018.00081］

Zhu W T， Lou Q， Vang Y S and Xie X H. 2017. Deep multi-instance networks with sparse label assignment for whole mammogram classification//Proceedings of the 20th International Conference on Medical Image Computing and Computer-Assisted Intervention. Quebec City， Canada： Springer： 603-611 ［DOI： 10.1007/978-3-319-66179-7_69http://dx.doi.org/10.1007/978-3-319-66179-7_69］

Zhu X Z， Su W J， Lu L W， Li B， Wang X G and Dai J F. 2021. Deformable DETR： deformable Transformers for end-to-end object detection//Proceedings of the 9th International Conference on Learning Representations. ［s.l.］： OpenReview.net

文章被引用时，请邮件提醒。

提交