多尺度动态视觉网络的手术机器人场景分割

刘敏; 秦敦璇; 韩雨斌; 陈祥; 王耀南

doi:10.11834/jig.240385

浏览量 : 0 下载量: 0 CSCD: 0

PDF
导出
分享
收藏
专辑

多尺度动态视觉网络的手术机器人场景分割
A multi-scale dynamic visual network for surgical robots scene segmentation
2024年页码：1-16
网络出版日期： 2024-10-16 ，
DOI： 10.11834/jig.240385
稿件说明：

移动端阅览

刘敏,秦敦璇,韩雨斌等.多尺度动态视觉网络的手术机器人场景分割[J].中国图象图形学报,

Liu Min,Qin Dunxuan,Han Yubin,et al.A multi-scale dynamic visual network for surgical robots scene segmentation[J].Journal of Image and Graphics,
刘敏,秦敦璇,韩雨斌等.多尺度动态视觉网络的手术机器人场景分割[J].中国图象图形学报, DOI： 10.11834/jig.240385.

Liu Min,Qin Dunxuan,Han Yubin,et al.A multi-scale dynamic visual network for surgical robots scene segmentation[J].Journal of Image and Graphics, DOI： 10.11834/jig.240385.

摘要

目的

机器人辅助腹腔镜手术指的是临床医生借助腔镜手术机器人完成外科手术。然而，腔镜手术在密闭的人体腔道完成，且分割目标的特征复杂多变，对医生的手术技能有较高要求。为了辅助医生完成腔镜手术，本文提出了一种高精度的腔镜手术场景分割方法，并搭建分体式腔镜手术机器人对所提出的方法进行了验证。

方法

首先，本文提出了多尺度动态视觉网络（multi-scale dynamic visual network， MDVNet）。该网络采用编码器-解码器结构。在编码器部分，动态大核卷积注意力模块（dynamic large kernel attention module， DLKA）可以通过多尺度大核注意力提取不同分割目标的多尺度特征，并通过动态选择机制进行自适应的特征融合。在解码器部分，低秩矩阵分解模块（low-rank matrix decomposition module， LMD）引导不同分辨率的特征图进行融合，可以有效滤除特征图中的噪声。边界引导模块（boundary guided module， BGM）可以引导模型学习手术场景的边界特征。最后，本文展示了基于Lap Game腹腔镜模拟器搭建的分体式腔镜手术机器人，网络模型的分割结果可以映射在手术机器人的视野中，辅助医生进行腔镜手术。

结果

MDVNet在三个手术场景数据集上取得了最先进的结果，平均交并比分别为51.19%、71.28%和52.47%。

结论

本文提出了适用于腔镜手术场景分割的多尺度动态视觉网络MDVNet，并在搭建的分体式腔镜手术机器人上对所提出方法进行了验证。本文代码开源地址：

https：//github.com/YubinHan73/MDVNet

https://github.com/YubinHan73/MDVNet

。

Abstract

Objective

Robot-assisted endoscopic surgery refers to the surgery with the help of intelligent endoscopic surgical robots， which can effectively reduce trauma， shorten the recovery period， and improve surgical success rates. Endoscopic surgery scene segmentation means the use of deep learning techniques to accurately segment the entire surgical scene， whose targets include anatomical areas and instruments. However， endoscopic surgery is completed in a closed human body cavity， and the whole process is accompanied by frequent cutting， traction， and other surgical operations， which makes the features of the segmentation targets complex and variable. In advancing robot-assisted surgery， it is crucial to develop high-precision surgical scene segmentation algorithms that can assist surgeons in performing surgeries. In this paper， we propose an innovative surgical scene segmentation network named Multi-scale Dynamic Visual Network （MDVNet）， which aims to address three major challenges in endoscopic surgical scene segmentation， including target size variation， intraoperative complex noises， and indistinguishable boundaries.

Method

MDVNet adopts an encoder-decoder structure. In the encoder part employs， Dynamic Large Kernel Convolutional Attention （DLKA） module can extract multi-scale features of the surgical scene. The DLKA module consists of multiple branches each equipped with large kernel convolutions of different sizes， allowing the network to capture details and a wider range of features of targets with different sizes， which are 7， 11， and 21 in this paper. Besides， the dynamic selection mechanism helps to adaptively fuse features to meet the needs of different sized segmentation targets in endoscopic surgery scene. This new-designed module directly addresses the problem of target size variability， which is a major obstacle faced by previous surgical scene segmentation methods. In the decoder part， this paper further proposes two key modules： low-rank matrix decomposition module （LMD） and boundary guided module （BGM） to solve the intraoperative complex noises and indistinguishable boundaries challenges in endoscopic images. The core idea of LMD is to separate the noise components from the useful feature information in the feature map through the low-rank matrix decomposition technique. In endoscopic surgical scenes， noise is generated in surgical images due to motion blur， blood splash， and water mist on the tissue surface caused by surgical operations. These noises will reduce the segmentation accuracy of the network. LMD decomposes the feature map into a low-rank matrix and a sparse matrix through the non-negative matrix factorization technique， where the low-rank matrix contains the main feature information of the image， while the sparse matrix contains the noise and outliers. Through this process， the LMD is able to effectively remove the noise and provide a high-quality feature map for subsequent segmentation tasks. In surgical scenes， the boundaries between different tissues and instruments are highly indistinguishable due to contact， occlusion or similar texture features. To solve this problem， BGM uses a combination of boundary-sensitive Laplace convolutions and normal convolutions to compute the boundary maps of the ground truth and the highest resolution feature maps， respectively. In addition， BGM uses a combination of cross-entropy loss function and Dice loss function to guide the network to learn the boundary features， which can help the network to pay more attention to the boundary region during the training process， thus improving the ability to recognize the boundary. In this study， in order to apply the proposed MDVNet to actual surgical scenarios and verify its effectiveness， the team constructed a split laparoscopic surgical robot platform. The platform was constructed with the practical operational needs of the surgical process， integrating the advanced Lap Game endoscopic simulator， the Franka robotic arms， and a high-precision endoscopic imaging system. Users can manipulate the robotic arm equipped with surgical instruments through the control handle， and complete endoscopic surgical operations such as cutting， freeing and suturing in the endoscopic simulator to simulate the process of surgery. The segmentation results of the network are displayed on the user console to assist the surgeon in performing the endoscopic surgery.

Result

In order to fully validate the effectiveness and potential of the proposed MDVNet， we conducted a comprehensive comparative analysis of the proposed method with other advanced surgical scene segmentation methods on three different surgical scene datasets， including the robotic surgical scene dataset （Endovis2018）， the cataract surgical scene dataset （CaDIS）， and the minimally invasive laparoscopic surgery dataset （MILS）. The experimental results show that MDVNet achieves the best segmentation results on all three datasets， with the intersection over union（mIoU）of 51.19% on Endovis2018， 71.28% on CaDIS （Task III）， and 52.47% on MILS. The visualization results on the three datasets also illustrate that MDVNet can effectively segment multiple targets such as surgical instruments and anatomical areas in surgical scenes. Moreover， we conducted a series of ablation experiments on the Endovis2018 dataset with three different modules， DLKA， LMD and BGM. The experimental results demonstrate that the different modules in MDVNet are complementary to each other and can be combined to produce a positive gain effect on the whole method. Finally， the proposed MDVNet is employed on the laparoscopic surgical robot， and the segmentation results of the network are superimposed with the original surgical images for output to assist the surgeon in performing laparoscopic surgery.

Conclusion

In order to solve the three major challenges of endoscopic surgical scene segmentation， including target size variation， intraoperative complex noises， and indistinguishable boundaries， this paper proposes an innovative surgical scene segmentation network named Multiscale Dynamic Vision Network. MDVNet is composed of three modules， DL

KA， LMD and BGM. In the encoder part， DLKA can extract the multi-scale features of different segmentation targets by multi-scale large kernel attention and perform adaptive feature fusion by dynamic selection mechanism， which can effectively reduce the misidentification caused by the change of target sizes. In the decoder part， LMD first filters out the noise in the feature maps and obtains high-quality feature maps. BGM guides the model to learn the boundary features of the surgical scene by calculating the loss between the boundary maps of both the feature maps and the ground truth. MDVNet achieves SOTA results on three different surgical scene datasets including Endovis2018， CaDIS and MILS. Code is available at

https：//github.com/YubinHan73/MDVNet

https://github.com/YubinHan73/MDVNet

关键词

腔镜手术机器人语义分割大核卷积低秩分解边界分割

Keywords

endoscopic surgical robotssemantic segmentationlarge kernel convolutionlow-rank matrix decompositionboundary segmentation

references

Su H.， Mariani A.， Ovur S. E.， Menciassi A.， Ferrigno G.， and De Momi E. 2021. Toward teaching by demonstration for robot-assisted minimally invasive surgery. IEEE Transactions on Automation Science and Engineering， 18（2）： 484-494 ［DOI： doi： 10.1109/TASE.2020.3045655http://dx.doi.org/doi：10.1109/TASE.2020.3045655］

D’Souza M.， Gendreau J.， Feng A.， Kim L. H.， Ho A. L.， and Veeravagu A. 2019. Robotic-assisted spine surgery： history， efficacy， cost， and future trends. Robotic Surgery： Research and Reviews， 6： 9-23 ［DOI： 10.2147/RSRR.S190720http://dx.doi.org/10.2147/RSRR.S190720］

Xie Y L. 2022. Research on laparoscopic surgery video semantic segmentation algorithm based on deep learning. Changsha： Hunan University

谢于廉. 2022. 基于深度学习的腔镜手术视频语义分割算法研究. 长沙：湖南大学

Wang T M， Zhang X H， Zhang X B and Wang J C. 2019. Research Progresses in Laparoscopic Augmented Reality Navigation. Robot， 41（1）： 124-136

王田苗，张晓会，张学斌，王君臣. 2019. 腹腔镜增强现实导航的研究进展综述. 机器人， 41（1）： 124-136 ［DOI：10.13973/j.cnki.robot.170625http://dx.doi.org/10.13973/j.cnki.robot.170625］

Collins T.， Pizarro D.， Gasparini S.， Bourdel N.， Chauvet P.， Canis M.， L Calvet and Bartoli A. 2020. Augmented reality guided laparoscopic surgery of the uterus. IEEE Transactions on Medical Imaging， 40（1）： 371-380 ［DOI： 10.1109/TMI.2020.3027442http://dx.doi.org/10.1109/TMI.2020.3027442］

Lei B.， Huang S.， Li H.， Li R.， Bian C.， Chou Y. H.， Qin J， Zhou P， Gong X and Cheng J. Z. 2020. Self-co-attention neural network for anatomy segmentation in whole breast ultrasound. Medical Image Analysis， 64： 101753 ［DOI： 10.1016/j.media.2020.101753http://dx.doi.org/10.1016/j.media.2020.101753］

Liu D.， Li Q.， Jiang T.， Wang Y.， Miao R.， Shan F.， and Li Z. 2021. Towards unified surgical skill assessment//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 9522-9531 ［DOI： 10.1109/CVPR46437.2021.00940http://dx.doi.org/10.1109/CVPR46437.2021.00940］

Despotović I， Goossens B and Philips W. 2015. MRI segmentation of the human brain： challenges， methods， and applications. Computational and Mathematical Methods in Medicine， 2015： 450341 ［DOI： 10.1155/2015/450341http://dx.doi.org/10.1155/2015/450341］

Liu M， Han Y， Wang J， Wang C， Wang Y and Meijering E. 2024. LSKANet： Long strip kernel attention network for robotic surgical scene segmentation. IEEE Transactions on Medical Imaging， 43（4）：1308-1322 ［DOI： 10.1109/TMI.2023.3335406http://dx.doi.org/10.1109/TMI.2023.3335406］

Allan M， Kondo S， Bodenstedt S， Leger S， Kadkhodamohammadi R， Luengo I， Fuentes F， Flouty E， Mohammed A， Pedersen M， Kori A， Varghese A， Krishnamurthi G， Rauber D， Mendel R， Palm C， Bano S， Saibro G， Shih C， Chiang H， Zhuang J， Yang J， Iglovikov V， Dobrenkii A， Reddiboina M， Reddy A， Liu X， Gao C， Unberath M， Azizian M， Stoyanov D， Maier-Hein L， and Speidel S. 2020. 2018 Robotic Scene Segmentation Challenge［EB/OL］.［2020-8-3］. https://doi.org/10.48550/arXiv. 2001.11190https://doi.org/10.48550/arXiv.2001.11190

Zhang X F， Zhang S， Zhang D H and Liu R. 2023. Group Attention-based Medical Image Segmentation Model. Journal of Image and Graphics， 28（10）：3231-3242

张学峰，张胜，张冬晖，刘瑞. 2023. 引入分组注意力的医学图像分割模型. 中国图象图形学报， 28（10）：3231- 3242［DOI：10. 11834/jig. 220748http://dx.doi.org/10.11834/jig.220748］

Hua Y， Li Z Z， Pan J H and Yang X. 2023. Boundary-preserving multi-scale glomerulus segmentation for full-stained kidney slice. Journal of Image and Graphics， 28（11）：3575-3589

花勇，李珍珍，潘建宏，杨烜. 2023. 边界信息保持的全染色肾脏切片多粒度分割 . 中国图象图形学报， 28（11）：3575-3589［DOI：10. 11834/jig. 221025http://dx.doi.org/10.11834/jig.221025］

A A. Shvets， A. Rakhlin， A. A. Kalinin and V. I. Iglovikov， 2018.Automatic Instrument Segmentation in Robot-Assisted Surgery using Deep Learning//Proceedings of the 17th IEEE International Conference on Machine Learning and Applications. Orlando， FL， USA： IEEE： 624-628 ［DOI： 10.1109/ICMLA.2018.00100http://dx.doi.org/10.1109/ICMLA.2018.00100］

M. Attia， M. Hossny， S. Nahavandi and H. Asadi. 2017. Surgical tool segmentation using a hybrid deep CNN-RNN auto encoder-decoder.// Proceedings of IEEE International Conference on Systems， Man， and Cybernetics. Banff， AB， Canada： IEEE： 3373-3378［DOI： 10.1109/SMC.2017.8123151http://dx.doi.org/10.1109/SMC.2017.8123151］

Cerón J C Á， Ruiz G O， Chang L and Ali S. 2022. Real-time instance segmentation of surgical instruments using attention and multi-scale feature fusion. Medical Image Analysis， 81： 102569 ［DOI： 10.1016/j.media.2022.102569http://dx.doi.org/10.1016/j.media.2022.102569］

Liu J， Guo X and Yuan Y. 2021. Graph-based surgical instrument adaptive segmentation via domain-common knowledge. IEEE Transactions on Medical Imaging， 41（3）： 715-726 ［DOI： 10.1109/TMI.2021.3121138http://dx.doi.org/10.1109/TMI.2021.3121138］

Jin Y， Cheng K， Dou Q and Heng Pheng-Ann. 2019.Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video//Proceedings of the 22nd International Conference on Medical Image Computing and Computer Assisted Intervention. Shenzhen， China： Springer： 440-448 ［DOI： 10.1007/978-3-030-32254-0_49http://dx.doi.org/10.1007/978-3-030-32254-0_49］

Xu W W， Xu L F， Li B K， Zhou X， Lyu N and Zhan S. 2024. TransAS-UNet：regional segmentation of breast cancer Swin Transformer and of UNet algorithm. Journal of Image and Graphics，29（03）：0741-0754

徐旺旺，许良凤，李博凯，周曦，律娜，詹曙 . 2024. TransASUNet：融合Swin Transformer和UNet的乳腺癌区域分割. 中国图象图形学报，29（03）：0741-0754［DOI：10. 11834/jig. 230130http://dx.doi.org/10.11834/jig.230130］

Wang J.， Jin Y.， Wang L.， Cai S.， Heng PA and Qin J. 2021. Efficient global-local memory for real-time instrument segmentation of robotic surgical video//Proceedings of the 24th International Conference on Medical Image Computing and Computer Assisted Intervention. Strasbourg， France： Springer： 341-351 ［DOI： 10.1007/978-3-030-87202-1_33http://dx.doi.org/10.1007/978-3-030-87202-1_33］

Yang L，Gu Y G，Bian G B and Liu Y H. 2023. A dual-encoder feature attention network for surgical instrument segmentation. Journal of Image and Graphics，28（10）：3214-3230

杨磊，谷玉格，边桂彬，刘艳红 . 2023. 双编码特征注意网络的手术器械分割 . 中国图象图形学报，28（10）：3214-3230［DOI：10. 11834/jig. 220716http://dx.doi.org/10.11834/jig.220716］

Shen W， Wang Y， Liu M， Wang J， Ding R， Zhang Z and Meijering E. 2023. Branch aggregation attention network for robotic surgical instrument segmentation. IEEE Transactions on Medical Imaging， 42（11）： 3408-3419 ［DOI： 10.1109/TMI.2023.3288127http://dx.doi.org/10.1109/TMI.2023.3288127］

Chen L C， Zhu Y， Papandreou G， Schroff F and Adam H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 801-818 ［DOI：10.1007/978-3-030-01234-2_49http://dx.doi.org/10.1007/978-3-030-01234-2_49］

J. Wang， K. Sun， T. Cheng， B. Jiang， C. Deng， Y. Zhao， D. Liu， Y. Mu， M. Tan， X. Wang， W. Liu， and B. Xiao. 2021 Deep High-Resolution Representation Learning for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence， 43（10）： 3349–3364 ［DOI： 10.1109/TPAMI.2020.2983686］.

Grammatikopoulou M， Flouty E， Kadkhodamohammadi A， Quellec G， Chow G， Nehme J， Luengo I and Stoyanov D. 2021. CaDIS： Cataract dataset for surgical RGB-image segmentation. Medical Image Analysis， 71： 102053 ［DOI： 10.1016/j.media.2021.102053http://dx.doi.org/10.1016/j.media.2021.102053］

Ni Z， Bian G， Li Z， Zhou X， Li R and Hou Z. 2022. Space squeeze reasoning and low-rank bilinear feature fusion for surgical image segmentation. IEEE Journal of Biomedical and Health Informatics， 26（7）： 3209-3217 ［DOI： 10.1109/JBHI.2022.3154925http://dx.doi.org/10.1109/JBHI.2022.3154925］

Du H， Wang J， Liu M， Wang Y and Meijering E. 2024. SwinPA-Net： Swin Transformer-based multiscale feature pyramid aggregation network for medical image segmentation. IEEE Transactions on Neural Networks and Learning Systems， 35（4）： 5355-5366 ［DOI：10.1109/TNNLS.2022.3204090http://dx.doi.org/10.1109/TNNLS.2022.3204090］

Jin Y， Yu Y， Chen C， Zhao Z， Heng P and Stoyanov D. 2022. Exploring intra-and inter-video relation for surgical semantic scene segmentation. IEEE Transactions on Medical Imaging， 41（11）： 2991-3002 ［DOI： 10.1109/TMI.2022.3177077http://dx.doi.org/10.1109/TMI.2022.3177077］

C. Lu， J. Tang， S. Yan and Z. Lin. 2014. Generalized Nonconvex Nonsmooth Low-Rank Minimization//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， OH， USA： IEEE： 4130-4137 ［DOI： 10.1109/CVPR.2014.526http://dx.doi.org/10.1109/CVPR.2014.526］

Ronneberger O， Fischer P and Brox T. 2015. U-net： Convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer Assisted Intervention. Munich， Germany： Springer： 234-241 ［DOI：10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28］

Xiao T， Liu Y， Zhou B， Jiang Y and Sun J. 2018. Unified perceptual parsing for scene understanding//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 418-434 ［DOI：10.1007/978-3-030-01228-1_26http://dx.doi.org/10.1007/978-3-030-01228-1_26］

Yuan Y， Chen X and Wang J. 2020. Object-contextual representations for semantic segmentation//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 173-190 ［DOI：10.1007/978-3-030-58539-6_11http://dx.doi.org/10.1007/978-3-030-58539-6_11］

Fan M.， Lai S.， Huang J.， Wei X.， Chai Z.， Luo J. and Wei X. 2021. Rethinking bisenet for real-time semantic segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， TN， USA： IEEE： 9711-9720 ［DOI：10.1109/CVPR46437.2021.00959http://dx.doi.org/10.1109/CVPR46437.2021.00959］

Zhao H， Shi J， Qi X， Wang X and Jia J. 2017. Pyramid scene parsing network//Proceedings of 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Honolulu， HI， USA： IEEE： 6230-6239 ［DOI：10.1109/CVPR.2017.660http://dx.doi.org/10.1109/CVPR.2017.660］

Xie E， Wang W， Yu Z， Anandkumar A， Alvarez， J and Luo， P. 2021. SegFormer： Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems， 34： 12077-12090.

Guo M， Lu C， Hou Q， Liu Z， Cheng M and Hu S. 2022. Segnext： Rethinking convolutional attention design for semantic segmentation. Advances in Neural Information Processing Systems， 35： 1140-1156.

Hendrycks D and Dietterich T. 2019. Benchmarking neural network robustness to common corruptions and perturbations ［EB/OL］. ［2019-5-28］. https://doi.org/10.48550/arXiv.1903.12261https://doi.org/10.48550/arXiv.1903.12261

文章被引用时，请邮件提醒。

提交

结合双边交叉增强与自注意力补偿的点云语义分割

面向无人机海岸带生态系统监测的语义分割基准数据集

基于深度学习的弱监督语义分割方法综述

跨层细节感知和分组注意力引导的遥感图像语义分割

语义分割和HSV色彩空间引导的低光照图像增强