医学图像描述综述:编码、解码及最新进展
A survey of medical image captioning technique: encoding, decoding and latest advance
- 2023年28卷第7期 页码:1990-2010
纸质出版日期: 2023-07-16
DOI: 10.11834/jig.211021
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2023-07-16 ,
移动端阅览
朱翌, 李秀. 2023. 医学图像描述综述:编码、解码及最新进展. 中国图象图形学报, 28(07):1990-2010
Zhu Yi, Li Xiu. 2023. A survey of medical image captioning technique: encoding, decoding and latest advance. Journal of Image and Graphics, 28(07):1990-2010
随着医疗成像技术的不断提升,放射科医师每天要撰写的医学报告也与日俱增。深度学习兴起后,基于深度学习的医学图像描述技术用于自动生成医学报告,取得了显著效果。本文全面整理了近年来深度医学图像描述方向的论文,包括这一领域的最新方法、数据集和评价指标,分析了它们各自的优劣,并以模型结构为线索予以介绍,是国内首篇针对医疗图像描述任务的综述。现今的深度医疗图像描述技术主要以编码器—解码器结构为基础进行拓展,包括但不局限于加入检索方法、模板匹配方法、注意力机制、强化学习和知识图谱等方法。检索和模板匹配方法虽然简单,但由于医学报告的特殊性仍在本任务上有不错的效果;注意力机制使模型产生报告时能关注图像和文本的某一部分,已经被几乎所有主流模型所采用;强化学习方法突破了医疗图像描述任务中梯度下降训练法与离散的语言生成评价指标不匹配的瓶颈;知识图谱方法则融合了人类医生对于疾病的先验知识,有效提高了生成报告的临床准确性。此外,Transformer等新型结构也正越来越多地取代循环神经网络(recurrent neural network,RNN)甚至卷积神经网络(convolutional neural network,CNN)的位置成为网络主干。本文最后讨论了目前深度医疗图像描述仍需解决的问题以及未来的研究方向,希望能推动深度医疗图像描述技术真正落地。
Medical image captioning is a labor-intensive daily task for radiologists nowadays. The emerging deep medical image captioning technique has its potential to generate medical captions automatically. There are some challenges to be resolved as mentioned below: 1) to organize a feasible and clear structure to readers; 2) to strengthen deep medical image caption task itself; 3) to optimize the introduced methods. First, the aims and objectives are identified. Then, literature is reviewed for the growth of deep medical image caption till 2021, including their latest methods, datasets and evaluation metrics, and comparative analysis between medical image caption task and generic image caption task. Deep image caption technique is introduced on the basis of prior network structure. Current deep medical image caption technique is mainly developed in terms of the encoder-decoder structure, such as adding retrieval-based methods, template matching based methods, attention mechanisms, reinforcement learning, and knowledge graphs. Specifically, the encoder-decoder structure can be integrated into convolutional neural network (CNN) for image feature extraction and recurrent neural network (RNN) for caption generation, and the two kind of networks are linked by an intermediate vector, called context vector. Such models are based on CNN-RNN-RNN structure, called hierarchical RNN or long short-term memory(LSTM). This structure allows two sort of RNNs to be stacked together, which can generate its thematic vector and captions, and the caption is generated and supervised by the theme vector. The feature of the medical captions can be recognized in relevance to high ratio of repetition and special sentence patterns although the retrieval-based and template-matched methods are still relatively simple. The attention mechanism can be used for a certain part of the image and sentence when the caption is generated and the length of the contextual vector becomes variable. Medical image caption task-oriented reinforcement learning (RL) can be used to alleviate the mismatch problem between the gradient descent training method and the discrete language generation evaluation metric as well. RL can also work as multi-agent to guide the decoder in the form of output before the decoder works, and it can output well-balanced and logical medical contents. Knowledge graph can integrate the prior knowledge of expertise into the model, and diseases having similar features will be in closer nodes in the graph where the disease information can be updated through graph convolution. The integration of medical knowledge graph is focused on improving the clinical accuracy of the generated report effectively.These methods are compatible for each other like template matching based method and attention mechanism based RL can be used simultaneously. In addition, Transformer-related structures have been developing intensively as the new backbone network beyond RNN and CNN. Transformer or the self-attention block can be trained in parallel, and it can capture the long-distance reliance between tokens, which serves as a better feature extractor. Popular datasets in deep medical image caption are IU X-Ray and MIMIC-CXR, in which frontal and lateral X-Ray images of chest and multiple sentences melted into a single report. Medical annotations like medical subject headings (MeSH) or unified medical language system (UMLS) keywords are beneficial to generate more accurate reports as they can be treated as extra information, and the classification of these tags can be seen as a pretraining task. Generic natural language generation metrics are applied to evaluate the report generated by deep medical image caption models. New metrics like SPICE, SPIDEr and BERTSCORE have been developing beyond existing BLEU-n, ROUGE, METEOR and CIDEr scores. Finally, future research directions are predicted on the four aspects: 1) more diverse and more accurate datasets, such as other related modalities like magnetic resonance imaging (MRI) and color Doppler ultrasound. The model can be more robust and adaptive to various tasks in this way because current datasets mostly focus on chest X-Ray photos, which is limited to a single body part and a single modality. 2) Evaluation metrics can be more accurate and cost-effective in clinical beyond BLEU or ROUGE scores-related generic natural language generation metrics. The manpower of radiologists can be optimized while existing generic NLG metrics are not the best evaluation in medicine. 3) Unsupervised and semi-supervised methods can be used to lower dataset-relevant cost for the medical image captioning task. The cost and training samples can be optimized based on the existing pre-training models like ViLBERT and VL-BERT. 4) More prior knowledge can be integrated into the model for the medical image captioning task and multi-round conversational medical report generation can be more detailed.
深度学习(DL)医学图像描述自动医学报告生成编码器—解码器图像字幕
deep learning(DL)medical image captioningautomatic radiology report generationencoder-decoderimage captioning
Abacha A B, Seco D H A, Gayen S, Demner-Fushman D and Antani S. 2017. NLM at ImageCLEF 2017 caption task//Proceedings of Working Notes of CLEF 2017. Dublin, Ireland: CEUR-WS.org
Aerts H J W A, Velazquez E R, Leijenaar R T H, Parmar C, Grossmann P, Carvalho S, Bussink J, Monshouwer R, Haibe-Kains B, Rietveld D, Hoebers F, Rietbergen M M, Leemans C R, Dekker A, Quackenbush J, Gillies R J and Lambin P. 2014. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nature Communications, 5: #4006 [DOI: 10.1038/ncomms5006http://dx.doi.org/10.1038/ncomms5006]
Alfarghaly O, Khaled R, Elkorany A, Helal M and Fahmy A. 2021. Automated radiology report generation using conditioned transformers. Informatics in Medicine Unlocked, 24: #100557 [DOI: 10.1016/j.imu.2021.100557http://dx.doi.org/10.1016/j.imu.2021.100557]
Anderson P, Fernando B, Johnson M and Gould S. 2016. Spice: semantic propositional image caption evaluation//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 382-398 [DOI: 10.1007/978-3-319-46454-1_24http://dx.doi.org/10.1007/978-3-319-46454-1_24]
Banerjee S and Lavie A. 2005. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments//Proceedings of ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Ann Arbor, USA: Association for Computational Linguistics: 65-72
Bustos A, Pertusa A, Salinas J M and de la Iglesia-Vay M. 2020. PadChest: a large chest X-ray image dataset with multi-label annotated reports. Medical Image Analysis, 66: #101797 [DOI: 10.1016/j.media.2020.101797http://dx.doi.org/10.1016/j.media.2020.101797]
Callison-Burch C, Osborne M and Koehn P. 2006. Re-evaluating the role of BLEU in machine translation research//Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics. Trento, Italy: Association for Computational Linguistics: 249-256
Chen Z H, Shen Y L, Song Y and Wan X. 2021. Cross-modal memory networks for radiology report generation//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. [s.l.]: Association for Computational Linguistics: 5904-5914 [DOI: 10.18653/v1/2021.acl-long.459http://dx.doi.org/10.18653/v1/2021.acl-long.459]
Chen Z H, Song Y, Chang T H and Wan X. 2020. Generating radiology reports via memory-driven Transformer//Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. Punta Cana, Dominican Republic: Association for Computational Linguistics: 1439-1449 [DOI: 10.18653/v1/2020.emnlp-main.112http://dx.doi.org/10.18653/v1/2020.emnlp-main.112]
Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H and Bengio Y. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar: Association for Computational Linguistics: 1724-1734 [DOI: 10.3115/v1/D14-1179http://dx.doi.org/10.3115/v1/D14-1179]
de Herrera A G S, Eickhof C, Andrearczyk V and Müller H. 2018. Overview of the imageCLEF 2018 caption prediction tasks//Proceedings of Working Notes of CLEF 2018. Avignon, France: CEUR-WS.org
Demner-Fushman D, Kohli M D, Rosenman M B, Shooshan S E, Rodriguez L, Antani S, Thoma G R and McDonald C J. 2016. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 23(2): 304-310 [DOI: 10.1093/jamia/ocv080http://dx.doi.org/10.1093/jamia/ocv080]
Donahue J, Hendricks L A, Guadarrama S, Rohrbach M, Venugopalan S, Darrell T and Saenko K. 2015. Long-term recurrent convolutional networks for visual recognition and description//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 2625-2634 [DOI: 10.1109/CVPR.2015.7298878http://dx.doi.org/10.1109/CVPR.2015.7298878]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2021. An image is worth 16 × 16 words: Transformers for image recognition at scale//Proceedings of the 9th International Conference on Learning Representations. [s.l.]: OpenReview.net
Eickhoff C, Schwall I, de Herrera A G S and Müller H. 2017. Overview of imageclefcaption 2017——image caption prediction and concept detection for biomedical images//Proceedings of Working Notes of CLEF 2017. Dublin, Ireland: CEUR-WS.org
Faghri F, Fleet D J, Kiros J R and Fidler S. 2017. VSE++: improving visual-semantic embeddings with hard negatives//Proceedings of British Machine Vision Conference 2018. Newcastle, UK: BMVA Press
Feng Y, Ma L, Liu W and Luo J B. 2019. Unsupervised image captioning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4120-4129 [DOI: 10.1109/CVPR.2019.00425http://dx.doi.org/10.1109/CVPR.2019.00425]
Gale W, Oakden-Rayner L, Carneiro G, Palmer L J and Bradley A P. 2019. Producing radiologist-quality reports for interpretable deep learning//The 16th IEEE International Symposium on Biomedical Imaging. Venice, Italy: IEEE: 1275-1279 [DOI: 10.1109/ISBI.2019.8759236http://dx.doi.org/10.1109/ISBI.2019.8759236]
Han Z Y, Wei B Z, Leung S, Chung J and Li S. 2018. Towards automatic report generation in spine radiology using weakly supervised framework//Proceedings of the 21st International Conference on Medical Image Computing and Computer-Assisted Intervention. Granada, Spain: Springer: 185-193 [DOI: 10.1007/978-3-030-00937-3_22http://dx.doi.org/10.1007/978-3-030-00937-3_22]
Harzig P, Chen Y Y, Chen F and Lienhart R. 2019. Addressing data bias problems for chest X-ray image report generation//Proceedings of the 30th British Machine Vision Conference 2019. Cardiff, UK: BMVA Press
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Henderson P, Islam R, Bachman P, Pineau J, Precup D and Meger D. 2018. Deep reinforcement learning that matters. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1): 3207-3214 [DOI: 10.1609/aaai.v32i1.11694http://dx.doi.org/10.1609/aaai.v32i1.11694]
Hochreiter S and Schmidhuber J. 1997. Long short-term memory. Neural Computation, 9(8): 1735-1780 [DOI: 10.1162/neco.1997.9.8.1735http://dx.doi.org/10.1162/neco.1997.9.8.1735]
Huang G, Liu Z, van der Maaten L and Weinberger K Q. 2016. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2261-2269 [DOI: 10.1109/CVPR.2017.243http://dx.doi.org/10.1109/CVPR.2017.243]
Huang J H, Yang C H H, Liu F Y, Tian M, Liu Y C, Wu T W, Lin I H, Wang K, Morikawa H, Chang H, Tegner J and Worring M. 2021. DeepOpht: medical report generation for retinal images via deep models and visual explanation//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE: 2441-2451 [DOI: 10.1109/WACV48630.2021.00249http://dx.doi.org/10.1109/WACV48630.2021.00249]
Huang X, Yan F Q, Xu W and Li M Z. 2019. Multi-attention and incorporating background information model for chest X-ray image report generation. IEEE Access, 7: 154808-154817 [DOI: 10.1109/access.2019.2947134http://dx.doi.org/10.1109/access.2019.2947134]
Irvin J, Rajpurkar P, Ko M, Yu Y F, Ciurea-Ilcus S, Chute C, Marklund H, Haghgoo B, Ball R, Shpanskaya K, Seekins J, Mong D A, Halabi S S, Sandberg J K, Jones R, Larson D B, Langlotz C P, Patel B N, Lungren M P and Ng A Y. 2019. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI: 590-597 [DOI: 10.1609/aaai.v33i01.3301590http://dx.doi.org/10.1609/aaai.v33i01.3301590]
Jing B Y, Wang Z Y and Xing E. 2019. Show, describe and conclude: on exploiting the structure information of chest X-ray reports//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics: 6570-6580 [DOI: 10.18653/v1/P19-1657http://dx.doi.org/10.18653/v1/P19-1657]
Jing B Y, Xie P T and Xing E. 2018. On the automatic generation of medical imaging reports//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: Association for Computational Linguistics: 2577-2586 [DOI: 10.18653/v1/P18-1240http://dx.doi.org/10.18653/v1/P18-1240]
Johnson A E W, Pollard T J, Berkowitz S J, Greenbaum N R, Lungren M P, Deng C Y, Mark R G and Horng S. 2019. Mimic-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data, 6(1): #317 [DOI: 10.1038/s41597-019-0322-0http://dx.doi.org/10.1038/s41597-019-0322-0]
Karpathy A and Li F F. 2017. Deep visual-semantic alignments for generating image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4): 664-676 [DOI: 10.1109/TPAMI.2016.2598339http://dx.doi.org/10.1109/TPAMI.2016.2598339]
Kisilev P, Sason E, Barkan E and Hashoul S. 2016. Medical image description using multi-task-loss CNN//Proceedings of the 1st International Workshop on Deep Learning in Medical Image Analysis, LABELS: International Workshop on Large-scale Annotation of Biomedical Data and Expert Label Synthesis. Athens, Greece: Springer: 121-129 [DOI: 10.1007/978-3-319-46976-8_13http://dx.doi.org/10.1007/978-3-319-46976-8_13]
Kisilev P, Walach E, Barkan E, Ophir B, Alpert S and Hashoul S Y. 2015a. From medical image to automatic medical report generation. IBM Journal of Research and Development, 59(2/3): 1-7 [DOI: 10.1147/JRD.2015.2393193http://dx.doi.org/10.1147/JRD.2015.2393193]
Kisilev P, Walach E, Hashoul S, Barkan E, Ophir B and Alpert S. 2015b. Semantic description of medical image findings: structured learning approach//Proceedings of British Machine Vision Conference 2015. Swansea, UK: BMVA Press: 171.1-171.11 [DOI: 10.5244/C.29.171]
Krupinski E A. 2010. Current perspectives in medical image perception. Attention, Perception, and Psychophysics, 72(5): 1205-1217 [DOI: 10.3758/APP.72.5.1205http://dx.doi.org/10.3758/APP.72.5.1205]
LeCun Y and Bengio Y. 1995. Convolutional networks for images, speech, and time series//Arbib M A, ed. The Handbook of Brain Theory and Neural Networks. Cambridge, USA: MIT Press [DOI: 10.5555/303568.303704http://dx.doi.org/10.5555/303568.303704]
Li C Y, Liang X D, Hu Z T and Xing E P. 2018. Hybrid retrieval-generation reinforced agent for medical image report generation//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc.: 1537-1547 [DOI: 10.5555/3326943.3327084http://dx.doi.org/10.5555/3326943.3327084]
Li C Y, Liang X D, Hu Z T and Xing E P. 2019. Knowledge-driven encode, retrieve, paraphrase for medical image report generation//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI: 6666-6673 [DOI: 10.1609/aaai.v33i01.33016666http://dx.doi.org/10.1609/aaai.v33i01.33016666]
Li M J, Yu Z K, Liu X, Yan R Y, Yu Y Y, Wang D M, Chen J, Lu J, Qi P, Wang J J and Liu J. 2020. Progress of point cloud algorithm in medical field. Journal of Image and Graphics, 25(10): 2013-2023
李美佳, 于泽宽, 刘晓, 颜荣耀, 于媛媛, 王大明, 陈涓, 陆军, 祁鹏, 王俊杰, 刘杰. 2020. 点云算法在医学领域的研究进展. 中国图象图形学报, 25(10): 2013-2023 [DOI: 10.11834/jig.200253http://dx.doi.org/10.11834/jig.200253]
Liang S, Li X Y, Zhu Y Q, Li X and Jiang S Q. 2017. ISIA at the imageCLEF 2017 image caption task//Proceedings of Working Notes of CLEF 2017. [s.l.]: [s.n.]
Lin C Y. 2004. ROUGE: a package for automatic evaluation of summaries//Proceedings of Text Summarization Branches Out. [s.l.]: [s.n.]
Lin C Y and Och F J. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics//Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Barcelona, Spain: Association for Computational Linguistics: 605-612 [DOI: 10.3115/1218955.1219032http://dx.doi.org/10.3115/1218955.1219032]
Lin M, Chen Q and Yan S C. 2013. Network in network[EB/OL]. [2021-10-20]. https://arxiv.org/pdf/1312.4400.pdfhttps://arxiv.org/pdf/1312.4400.pdf
Lin T Y, Dollr P, Girshick R, He K M, Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 936-944 [DOI: 10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106]
Liu F L, Wu X, Ge S, Fan W and Zou Y X. 2021a. Exploring and distilling posterior and prior knowledge for radiology report generation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 13748-13757 [DOI: 10.1109/CVPR46437.2021.01354http://dx.doi.org/10.1109/CVPR46437.2021.01354]
Liu F L, Yin C C, Wu X, Ge S, Zhang P and Sun X. 2021b. Contrastive attention for automatic chest X-ray report generation//Proceedings of Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. [s.l.]: Association for Computational Linguistics: 269-280 [DOI: 10.18653/v1/2021.findings-acl.23http://dx.doi.org/10.18653/v1/2021.findings-acl.23]
Liu F L, You C Y, Wu X, Ge S, Wang S and Sun X. 2021c. Auto-encoding knowledge graph for unsupervised medical report generation//Proceedings of the 35th Conference on Neural Information Processing Systems. [s.l.]: NeurIPS
Liu G X, Hsu T M H, McDermott M, Boag W, Weng W H, Szolovits P and Ghassemi M. 2019. Clinically accurate chest X-ray report generation//Proceedings of the 4th Machine Learning for Healthcare Conference. Ann Arbor, USA: PMLR: 249-269
Liu S Q, Zhu Z H, Ye N, Guadarrama S and Murphy K. 2017. Improved image captioning via policy gradient optimization of SPIDer//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 873-881 [DOI: 10.1109/ICCV.2017.100http://dx.doi.org/10.1109/ICCV.2017.100]
Lu J S, Batra D, Parikh D and Lee S. 2019. ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks//Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada: NeurIPS
Ma L F, Luo F, Yan J P, Xu Z, Luo J and Li X. 2021. Deep-learning based medical image registration pathway: towards unsupervised learning. Journal of Image and Graphics, 26(9): 2037-2057
马露凡, 罗凤, 严江鹏, 徐哲, 罗捷, 李秀. 2021. 深度医学图像配准研究进展: 迈向无监督学习. 中国图象图形学报, 26(9): 2037-2057 [DOI: 10.11834/jig.200361http://dx.doi.org/10.11834/jig.200361]
Ma L L, Han X P and Sun L. 2018. A survey of image captioning. Journal of Chinese Information Processing, 32(4): 1-12
马龙龙, 韩先培, 孙乐. 2018. 图像的文本描述方法研究综述. 中文信息学报, 32(4): 1-12 [DOI: 10.3969/j.issn.1003-0077.2018.04.001http://dx.doi.org/10.3969/j.issn.1003-0077.2018.04.001]
Monshi M M A, Poon J and Chung V. 2020. Deep learning in generating radiology reports: a survey. Artificial Intelligence in Medicine, 106: #101878 [DOI: 10.1016/j.artmed.2020.101878http://dx.doi.org/10.1016/j.artmed.2020.101878]
Müller H, Kalpathy-Cramer J, Demner-Fushman D and Antani S. 2012. Creating a classification of image types in the medical literature for visual categorization//Proceedings of SPIE 8319, Medical Imaging 2012: Advanced PACS-based Imaging Informatics and Therapeutic Applications. San Diego, USA: SPIE: #83190P [DOI: 10.1117/12.911186http://dx.doi.org/10.1117/12.911186]
Papineni S, Roukos S, Ward T and Zhu W J. 2002. BLEU: a method for automatic evaluation of machine translation//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Philadelphia: USA: Association for Computational Linguistics: 311-318 [DOI: 10.3115/1073083.1073135http://dx.doi.org/10.3115/1073083.1073135]
Pelka O, Koitka S, Rückert J, Nensa F and Friedrich C M. 2018. Radiology objects in context (ROCO): a multimodal image dataset//Proceedings of the 7th International Workshop on Large-scale Annotation of Biomedical Data and Expert Label Synthesis, STENT: International Workshop on Computer Assisted Stenting, CVII-STENT: Joint MICCAI-Workshops on Computing and Visualization for Intravascular Imaging and Computer Assisted Stenting. Granada, Spain: Springer: 180-189 [DOI: 10.1007/978-3-030-01364-6_20http://dx.doi.org/10.1007/978-3-030-01364-6_20]
Qin H and Song Y. 2022. Reinforced cross-modal alignment for radiology report generation//Proceedings of Findings of the Association for Computational Linguistics: ACL 2022. Dublin, Ireland: Association for Computational Linguistics: 448-458 [DOI: 10.18653/v1/2022.findings-acl.38http://dx.doi.org/10.18653/v1/2022.findings-acl.38]
Rajpurkar, P, Irvin J, Zhu K, Yang B, Mehta H, Duan T, Ding D, Bagul A, Langlotz C and Shpanskaya K. 2017. Chexnet: radiologist-level pneumonia detection on chest X-rays with deep learning [EB/OL]. [2021-10-20]. https://arxiv.org/pdf/1711.05225.pdfhttps://arxiv.org/pdf/1711.05225.pdf
Reimers N and Gurevych I. 2019. Sentence-BERT: sentence embeddings using siamese BERT-networks//Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong, China: Association for Computational Linguistics: 3982-3992 [DOI: 10.18653/v1/D19-1410http://dx.doi.org/10.18653/v1/D19-1410]
Rennie S J, Marcheret E, Mroueh Y, Ross J and Goel V. 2017. Self-critical sequence training for image captioning//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1179-1195 [DOI: 10.1109/CVPR.2017.131http://dx.doi.org/10.1109/CVPR.2017.131]
Sanh V, Debut L, Chaumond J and Wolf T. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter [EB/OL]. [2021-10-20]. https://arxiv.org/pdf/1910.01108.pdfhttps://arxiv.org/pdf/1910.01108.pdf
Schlegl T, Waldstein S M, Vogl W D, Schmidt-Erfurth U and Langs G. 2015. Predicting semantic descriptions from medical images with convolutional neural networks//Proceedings of the 24th International Conference on Information Processing in Medical Imaging. Sabhal Mor Ostaig, UK: Springer: 437-448 [DOI: 10.1007/978-3-319-19992-4_34http://dx.doi.org/10.1007/978-3-319-19992-4_34]
Shin H C, Roberts K, Lu L, Demner-Fushman D, Yao J H and Summers R M. 2016. Learning to read chest X-rays: recurrent neural cascade model for automated image annotation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2497-2506 [DOI: 10.1109/CVPR.2016.274http://dx.doi.org/10.1109/CVPR.2016.274]
Simonyan K and Zisserman. 2014. A very deep convolutional networks for large-scale image recognition [EB/OL]. [2021-10-20]. https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf
Soldaini L and Goharian N. 2016. QuickUMLS: a fast, unsupervised approach for medical concept extraction [EB/OL]. [2021-10-20]. http://ir.cs.georgetown.edu/downloads/quickumls.pdfhttp://ir.cs.georgetown.edu/downloads/quickumls.pdf
Su W J, Zhu X Z, Cao Y, Li B, Lu L W, Wei F R and Dai J F. 2020. Vl-BERT: pre-training of generic visual-linguistic representations//Proceedings of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia: OpenReview.net
Su Y P, Liu F F and Rosen M P. 2018. UMass at imageCLEF caption prediction 2018 task//Proceedings of Working Notes of CLEF 2018. Avignon, France: CEUR-WS.org
Sutskever I, Vinyals O and Le Q V. 2014. Sequence to sequence learning with neural networks//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 3104-3112 [DOI: 10.5555/2969033.2969173http://dx.doi.org/10.5555/2969033.2969173]
Syeda-Mahmood T, Wong K C L, Gur Y, Wu J T, Jadhav A, Kashyap S, Karargyris A, Pillai A, Sharma A, Syed A B, Boyko O and Moradi M. 2020. Chest X-ray report generation through fine-grained label learning//Proceedings of the 23rd International Conference on Medical Image Computing and Computer-Assisted Intervention. Lima, Peru: Springer: 561-571 [DOI: 10.1007/978-3-030-59713-9_54http://dx.doi.org/10.1007/978-3-030-59713-9_54]
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1-9 [DOI: 10.1109/CVPR.2015.7298594http://dx.doi.org/10.1109/CVPR.2015.7298594]
Tsochantaridis I, Hofmann T, Joachims T and Altun Y. 2004. Support vector machine learning for interdependent and structured output spaces//Proceedings of the 21st International Conference on Machine Learning. Banff, Canada: ACM: #104 [DOI: 10.1145/1015330.1015341http://dx.doi.org/10.1145/1015330.1015341]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 6000-6010 [DOI: 10.5555/3295222.3295349http://dx.doi.org/10.5555/3295222.3295349]
Vedantam R, Zitnick C L and Parikh D. 2015. CIDEr: consensus-based image description evaluation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 4566-4575 [DOI: 10.1109/CVPR.2015.7299087http://dx.doi.org/10.1109/CVPR.2015.7299087]
Vinyals O, Toshev A, Bengio S and Erhan D. 2015. Show and tell: a neural image caption generator//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3156-3164 [DOI: 10.1109/CVPR.2015.7298935http://dx.doi.org/10.1109/CVPR.2015.7298935]
Wang X S, Peng Y F, Lu L, Lu Z Y, Bagheri M and Summers R M. 2017. Chestx-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3462-3471 [DOI: 10.1109/CVPR.2017.369http://dx.doi.org/10.1109/CVPR.2017.369]
Wang X S, Peng Y F, Lu L, Lu Z Y and Summers R M. 2018. TieNet: text-image embedding network for common thorax disease classification and reporting in chest X-rays//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 9049-9058 [DOI: 10.1109/CVPR.2018.00943http://dx.doi.org/10.1109/CVPR.2018.00943]
Wang Z Y, Han H W, Wang L, Li X and Zhou L P. 2022. Automated radiographic report generation purely on Transformer: a multicriteria supervised approach. IEEE Transactions on Medical Imaging, 41(10): 2803-2813 [DOI: 10.1109/TMI.2022.3171661http://dx.doi.org/10.1109/TMI.2022.3171661]
Wang Z Y, Zhou L P, Wang L and Li X. 2021. A self-boosting framework for automated radiographic report generation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 2433-2442 [DOI: 10.1109/CVPR46437.2021.00246http://dx.doi.org/10.1109/CVPR46437.2021.00246]
Wei Z Y, Fan Z H, Wang R Z, Cheng Y J, Zhao W R and Huang X J. 2020. From vision to text: a brief survey for image captioning. Journal of Chinese Information Processing, 34(7): 19-29
魏忠钰, 范智昊, 王瑞泽, 承怡菁, 赵王榕, 黄萱菁. 2020. 从视觉到文本: 图像描述生成的研究进展综述. 中文信息学报, 34(7): 19-29 [DOI: 10.3969/j.issn.1003-0077.2020.07.002http://dx.doi.org/10.3969/j.issn.1003-0077.2020.07.002]
Xiong Y X, Du B and Yan P K. 2019. Reinforced Transformer for medical image captioning//Proceedings of the 10th International Workshop on Machine Learning in Medical Imaging. Shenzhen, China: Springer: 673-680 [DOI: 10.1007/978-3-030-32692-0_77http://dx.doi.org/10.1007/978-3-030-32692-0_77]
Xu H, Zhang K, Tian Y J, Zhong F G and Wang Z C. 2021. Review of deep neural network-based image caption. Computer Engineering and Applications, 57(9): 9-22
许昊, 张凯, 田英杰, 种法广, 王子超. 2021. 深度神经网络图像描述综述. 计算机工程与应用, 57(9): 9-22 [DOI: 10.3778/j.issn.1002-8331.2012-0539http://dx.doi.org/10.3778/j.issn.1002-8331.2012-0539]
Xu K, Ba J L, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R S and Bengio Y. 2015. Show, attend and tell: neural image caption generation with visual attention//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille, France: JMLR.org: 2048-2057 [DOI: 10.5555/3045118.3045336http://dx.doi.org/10.5555/3045118.3045336]
Xue Y, Xu T, Long R, Xue Z Y, Antani S, Thoma G R and Huang X L. 2018. Multimodal recurrent model with attention for automated radiology report generation//Proceedings of the 21st International Conference on Medical Image Computing and Computer-Assisted Intervention. Granada, Spain: Springer: 457-466 [DOI: 10.1007/978-3-030-00928-1_52http://dx.doi.org/10.1007/978-3-030-00928-1_52]
Yang J C and Ni B B. 2020. Advances and challenges in medical 3D computer vision. Journal of Image and Graphics, 25(10): 2002-2012
杨健程, 倪冰冰. 2020. 医学3D计算机视觉: 研究进展和挑战. 中国图象图形学报, 25(10): 2002-2012 [DOI: 10.11834/jig.200244http://dx.doi.org/10.11834/jig.200244]
Yang S X, Wu X, Ge S, Zhou S K and Xiao L. 2021. Radiology report generation with a learned knowledge base and multi-modal alignment [EB/OL]. [2021-10-20]. https://arxiv.org/pdf/2112.15011.pdfhttps://arxiv.org/pdf/2112.15011.pdf
Yang X Y, He X H, Zhao J Y, Zhang Y C, Zhang S H and Xie P T. 2020. COVID-CT-dataset: a CT scan dataset about COVID-19 [EB/OL]. [2021-10-20]. https://arxiv.org/pdf/2003.13865.pdfhttps://arxiv.org/pdf/2003.13865.pdf
Yin C C, Qian B Y, Wei J S, Li X Y, Zhang X L, Li Y and Zheng Q H. 2019. Automatic generation of medical imaging diagnostic report with hierarchical recurrent neural network//Proceedings of 2019 IEEE International Conference on Data Mining. Beijing, China: IEEE: 728-737 [DOI: 10.1109/icdm.2019.00083http://dx.doi.org/10.1109/icdm.2019.00083]
You Q Z, Jin H L, Wang Z W, Fang C and Luo J B. 2016. Image captioning with semantic attention//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4651-4659 [DOI: 10.1109/CVPR.2016.503http://dx.doi.org/10.1109/CVPR.2016.503]
Yuan J B, Liao H F, Luo R and Luo J B. 2019. Automatic radiology report generation based on multi-view image fusion and medical concept enrichment//Proceedings of the 22nd International Conference on Medical Image Computing and Computer-Assisted Intervention. Shenzhen, China: Springer: 721-729 [DOI: 10.1007/978-3-030-32226-7_80http://dx.doi.org/10.1007/978-3-030-32226-7_80]
Zagoruyko S and Komodakis N. 2016. Wide residual networks//Proceedings of British Machine Vision Conference 2016. York, UK: BMVC
Zhang T Y, Kishore V, Wu F, Weinberger K Q and Artzi Y. 2020a. BERTScore: evaluating text generation with BERT//Proceedings of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia: OpenReview.net
Zhang Y H, Ding D Y, Qian T P, Manning C D and Langlotz C P. 2018. Learning to summarize radiology findings//Proceedings of the 9th International Workshop on Health Text Mining and Information Analysis. Brussels, Belgium: Association for Computational Linguistics: 204-213 [DOI: 10.18653/v1/W18-5623http://dx.doi.org/10.18653/v1/W18-5623]
Zhang Y X, Wang X S, Xu Z Y, Yu Q H and Xu D G. 2020b. When radiology report generation meets knowledge graph//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 12910-12917 [DOI: 10.1609/aaai.v34i07.6989http://dx.doi.org/10.1609/aaai.v34i07.6989]
Zhang Z Z, Chen P J, Sapkota M and Yang L. 2017a. TandemNet: distilling knowledge from medical images using diagnostic reports as optional semantic references//Proceedings of the 20th International Conference on Medical Image Computing and Computer-Assisted Intervention. Quebec City, Canada: Springer: 320-328 [DOI: 10.1007/978-3-319-66179-7_37http://dx.doi.org/10.1007/978-3-319-66179-7_37]
Zhang Z Z, Xie Y P, Xing F Y, McGough M and Yang L. 2017b. MDNet: a semantically and visually interpretable medical image diagnosis network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3549-3557 [DOI: 10.1109/CVPR.2017.378http://dx.doi.org/10.1109/CVPR.2017.378]
Zhou T, Dong Y L, Huo B Q, Liu S and Ma Z J. 2021. U-Net and its applications in medical image segmentation: a review. Journal of Image and Graphics, 26(9): 2058-2077
周涛, 董雅丽, 霍兵强, 刘珊, 马宗军. 2021. U-Net网络医学图像分割应用综述. 中国图象图形学报, 26(9): 2058-2077 [DOI: 10.11834/jig.200704http://dx.doi.org/10.11834/jig.200704]
Zhuang T G. 1991. Application of Computer in Biomedical Science. Nanjing, China: Southeast University Press
庄天戈. 1991. 计算机在生物医学中的应用. 南京: 东南大学出版社
相关作者
相关机构