级联混合模型引导的实时柑橘采摘点定位方法
Real-time citrus picking point localization guided by joint learning network
- 2024年29卷第10期 页码:3130-3143
纸质出版日期: 2024-10-16
DOI: 10.11834/jig.230755
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-10-16 ,
移动端阅览
梁云, 刘云帆, 林毅申, 姜伟鹏, 黄梓帆. 2024. 级联混合模型引导的实时柑橘采摘点定位方法. 中国图象图形学报, 29(10):3130-3143
Liang Yun, Liu Yunfan, Lin Yishen, Jiang Weipeng, Huang Zifan. 2024. Real-time citrus picking point localization guided by joint learning network. Journal of Image and Graphics, 29(10):3130-3143
目的
2
柑橘是我国最常见的水果之一,目前多以人工采摘为主,成本高、效率低等问题严重制约规模化生产,因此柑橘自动采摘成为近年的研究热点。但是,柑橘生长环境复杂、枝条形态各异、枝叶和果实互遮挡严重,如何精准实时地定位采摘点成为自动采摘的关键。通过构建级联混合网络模型,提出了一种通用且高效的柑橘采摘点自动精准定位方法。
方法
2
构建团簇框生成模型和枝条稀疏实例分割模型,对两者进行级联混合实现实时柑橘采摘点定位。首先,构建柑橘果实检测网络,提出团簇框生成模型,该模型通过特征提取、果实检测框生成和DBSCAN(density-based spatial clustering of applications with noise)果实密度聚类,实时地生成图像内果实数目最多的团簇框坐标;然后,提出融合亮度先验的枝条稀疏分割模型,该模型以团簇框内的图像作为输入,有效降低背景枝条的干扰,通过融合亮度先验的稀疏实例激活图,实时地分割出与果实相连接枝条实例;最后基于分割结果搜索果实采摘点定位坐标。
结果
2
经过长时间户外采集制作了柑橘果实检测数据集CFDD(citrus fruit detection dataset)和柑橘枝条分割数据集CBSD(citrus branch segmentation dataset)。这两个数据集由成熟果实、未成熟果实组成,包含晴天、阴天、顺光和逆光等挑战,总共37 000幅图像。在该数据集上本文方法的采摘点定位精准度达到了95.77%,帧率(frames per second,FPS)达到了28.21帧/s。
结论
2
本文方法在果实采摘点定位方面取得较好进展,能够快速且准确地获取柑橘采摘点,并且提供配套的机械臂采摘设备可供该采摘点定位算法的落地使用,为柑橘产业发展提供有力支持。
Objective
2
Citrus is one of the most common fruits in our country. At present, it is mostly picked by hand, but issues such as high cost and low efficiency severely restrict the scale of production. Therefore, automatic citrus picking has become a research hotspot in recent years. However, the growing environment of citrus is complex, its branch has different shapes, and the branches, leaves, and fruits are seriously shielded from each other. Accurate and real-time location of the picking point becomes the crucial aspect of automated picking. Currently, research on fruit picking point localization methods can be broadly categorized into two types: nondeep learning-based methods and deep learning-based methods. Nondeep learning-based methods mainly rely on digital image processing techniques, such as color space conversion, threshold segmentation, and watershed algorithm, to extract target contours and design corresponding algorithms for picking point localization. However, these methods often suffer from low accuracy and efficiency. Deep learning-based methods involve training deep learning models to perform tasks, such as detection or segmentation. The model’s output is then used as an intermediate result, and specific algorithms are designed on the basis of fruit growth characteristics and task requirements to achieve fruit picking point localization. These methods offer increased accuracy and real-time capabilities, making them a recent research focus. As a result, researchers are currently more focused on the application of deep learning methods in this area. However, most of the existing picking point localization methods have limitations in practical applications, often constrained by the design of end effectors and idealized application scenarios. Therefore, this study conducted a series of research to propose a universal and efficient method for locating fruit picking points, aiming to overcome the limitations of existing methods.
Method
2
This study proposes a framework for generating cluster bounding boxes and sparse instance segmentation of citrus branches and combines them in a cascade model to achieve real-time localization of citrus picking points. This method is mainly implemented in three steps. In the first step, the image within the picking field of view is input into a fruit object detector based on a feature extraction module (CSPDarknet) and a path aggregation network (PANnet). Multiple fruit object detection boxes are predicted through multilevel detection. These boxes are then clustered using a cluster box generator, and the cluster box with the most amount of citrus is selected, and its coordinates are calculated. In the second step, the image patches inside the cluster box are extracted and input to a branch sparse segmentation model. This step further focuses on segmenting the branch region, reducing background interference. Brightness priors are added to guide the weights of instance activation maps. The feature decoder learns the branch instance segmentation result. In the third step, on the basis of the branch segmentation result and the center points obtained in the first step for the cluster boxes, the branch instance masks are clustered to determine the relative position of branches to the cluster box center points. The final picking point coordinates are located by performing a pixel-wise search.
Result
2
We collected and created the citrus fruit detection dataset and citrus branch segmentation dataset through long-term outdoor collection. These two datasets consist of mature and immature citrus and include various challenges, such as sunny weather, cloudy weather, front light, and back light. The datasets have a total of 37 000 images. We conducted experimental validation on our proposed method using the separate datasets for citrus fruit detection and branch instance segmentation tasks. The fruit detection subtask based on the YOLOv5 model achieved an accuracy of 92.82%. Additionally, our proposed branch sparse instance segmentation method (BP-SparseInst) improved the performance of the branch segmentation task by 1.15%. Furthermore, our pick point localization method, based on the cascaded improvement model utilizing the cluster segmentation strategy and the DBSCAN fruit density algorithm, achieved a pick point localization accuracy of 95.77%. This result represents an improvement of approximately 4.1% compared with the previous method. Moreover, this method exhibited real-time performance with an FPS of 28.21 frame/s, which is an improvement of 8.07 frame/s compared with the previous method.
Conclusion
2
Through these research efforts, we have made remarkable progress in the localization of fruit picking points in citrus. Moreover, it provides a matching robotic arm picking device for the practical application of the picking point positioning algorithm. This advancement provides strong support for the development of the citrus industry.
采摘机器人采摘点定位方法团簇框生成器亮度先验枝条稀疏分割模型
picking robotpicking point positioning methodcluster box generatorbrightness priorbranch sparse segmentation model
Bolya D, Zhou C, Xiao F Y and Lee Y J. 2019. YOLACT: real-time instance segmentation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 9156-9165 [DOI: 10.1109/ICCV.2019.00925http://dx.doi.org/10.1109/ICCV.2019.00925]
Cai Z W and Vasconcelos N. 2018. Cascade R-CNN: delving into high quality object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 6154-6162 [DOI: 10.1109/CVPR.2018.00644http://dx.doi.org/10.1109/CVPR.2018.00644]
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A and Zagoruyko S. 2020. End-to-end object detection with Transformers//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 213-229 [DOI: 10.1007/978-3-030-58452-8_13http://dx.doi.org/10.1007/978-3-030-58452-8_13]
Chen C L, Lu J Z, Zhou M C, Yi J, Liao M and Gao Z M. 2022. A YOLOv3-based computer vision system for identification of tea buds and the picking point. Computers and Electronics in Agriculture, 198: #107116 [DOI: 10.1016/j.compag.2022.107116http://dx.doi.org/10.1016/j.compag.2022.107116]
Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2018. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4): 834-848 [DOI: 10.1109/TPAMI.2017.2699184http://dx.doi.org/10.1109/TPAMI.2017.2699184]
Chen X F and Li J. 2021. Comparative analysis of road damage detection algorithms based on One-Stage object detection. Journal of Computer Applications, 41(S2): 81-85
陈晓芳, 李季. 2021. 基于One-Stage目标检测的路面损害检测算法对比分析. 计算机应用, 41(S2): 81-85 [DOI: 10.11772/j.issn.1001-9081.2021040573http://dx.doi.org/10.11772/j.issn.1001-9081.2021040573]
Cheng T H, Wang X G, Chen S Y, Zhang W Q, Zhang Q, Huang C, Zhang Z X and Liu W Y. 2022. Sparse instance activation for real-time instance segmentation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 4423-4432 [DOI: 10.1109/CVPR52688.2022.00439http://dx.doi.org/10.1109/CVPR52688.2022.00439]
Dai J F, He K M, Li Y, Ren S Q and Sun J. 2016. Instance-sensitive fully convolutional networks//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 534-549 [DOI: 10.1007/978-3-319-46466-4_32http://dx.doi.org/10.1007/978-3-319-46466-4_32]
Ester M, Kriegel H P, Sander J and Xu X W. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, Oregon, USA: AAAI Press: 226-231
Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 580-587 [DOI: 10.1109/CVPR.2014.81http://dx.doi.org/10.1109/CVPR.2014.81]
He K M, Gkioxari G, Dollr P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2980-2988 [DOI: 10.1109/ICCV.2017.322http://dx.doi.org/10.1109/ICCV.2017.322]
He K M, Zhang X Y, Ren S Q and Sun J. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1904-1916 [DOI: 10.1109/TPAMI.2015.2389824http://dx.doi.org/10.1109/TPAMI.2015.2389824]
Hu X M, Ni B W and Chai J F. 2019. Research on the location of citrus picking point based on structured light camera//Proceedings of 2019 IEEE International Conference on Image, Vision and Computing (ICIVC). Xiamen, China: IEEE: 567-571 [DOI: 10.1109/ICIVC47709.2019.8980938http://dx.doi.org/10.1109/ICIVC47709.2019.8980938]
Li F, Zhang H, Liu S l, Guo J, Ni L M and Zhang L. DN-DETR: accelerate DETR training by introducing query denoising//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 13619-13627 [DOI: 10.48550/arXiv.2203.01305http://dx.doi.org/10.48550/arXiv.2203.01305]
Li Y, Qi H Z, Dai J F, Ji X Y and Wei Y C. 2017. Fully convolutional instance-aware semantic segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 4438-4446 [DOI: 10.1109/CVPR.2017.472http://dx.doi.org/10.1109/CVPR.2017.472]
Lin T Y, Dollr P, Girshick R, He K M, Hariharan B and Belongie S. 2017a. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 936-944 [DOI: 10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106]
Lin T Y, Goyal P, Girshick R, He K M and Dollar P. 2017b. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2999-3007 [DOI: 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324]
Liu S, Qi L, Qin H F, Shi J P and Jia J Y. 2018. Path aggregation network for instance segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8759-8768 [DOI: 10.1109/CVPR.2018.00913http://dx.doi.org/10.1109/CVPR.2018.00913]
Liu S L, Li F, Zhang H, Yang X, Qi X B, Su H, Zhu J and Zhang L. DAB-DETR: dynamic anchor boxes are better queries for DETR [EB/OL]. [2023-10-27]. https://arxiv.org/pdf/2201.12329.pdfhttps://arxiv.org/pdf/2201.12329.pdf
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot multibox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 21-37 [DOI: 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2]
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3431-3440 [DOI: 10.1109/TPAMI.2016.2572683http://dx.doi.org/10.1109/TPAMI.2016.2572683]
Purkait P, Zhao C and Zach C. 2017. SPP-Net: deep absolute pose regression with synthetic views [EB/OL]. [2023-10-27]. https://arxiv.org/pdf/1712.03452.pdfhttps://arxiv.org/pdf/1712.03452.pdf
Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 779-788 [DOI: 10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91]
Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149 [DOI: 10.1109/TPAMI.2016.2577031http://dx.doi.org/10.1109/TPAMI.2016.2577031]
Rong J C, Wang P B, Wang T J, Hu L and Yuan T. 2022. Fruit pose recognition and directional orderly grasping strategies for tomato harvesting robots. Computers and Electronics in Agriculture, 202: #107430 [DOI: 10.1016/j.compag.2022.107430http://dx.doi.org/10.1016/j.compag.2022.107430]
Sun B Y. 2021. Research on Tomato Fruit Target Detection and Tomato String Picking Point Location Technology Based on Deep Learning. Tianjin: Tianjin University of Technology
孙碧玉. 2021. 基于深度学习的番茄果实目标检测和番茄串采摘点定位技术研究. 天津: 天津理工大学 [DOI: 10.27360/d.cnki.gtlgy.2021.000534http://dx.doi.org/10.27360/d.cnki.gtlgy.2021.000534]
Tang Y C, Chen M Y, Wang C L, Luo L F, Lian G P and Zou X J. Recognition and localization methods for vision-based fruit picking robots: a review. Frontiers in Plant Science, 2020, 11: #510 [DOI: /10.3389/fpls.2020.00510http://dx.doi.org//10.3389/fpls.2020.00510]
Wang C Y, Liao H Y M, Wu Y H, Chen P Y, Hsieh J W and Yeh I H. 2020a. CSPNet: a new backbone that can enhance learning capability of CNN//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle, USA: IEEE: 1570-1580 [DOI: 10.1109/CVPRW50498.2020.00203http://dx.doi.org/10.1109/CVPRW50498.2020.00203]
Wang X L, Kong T, Shen C H, Jiang Y N and Li L. 2020b. SOLO: segmenting objects by locations//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 649-665 [DOI: 10.1007/978-3-030-58523-5_38http://dx.doi.org/10.1007/978-3-030-58523-5_38]
Zhang H, Li F, Liu S L, Zhang L, Su H, Zhu J, Ni L M and Shum H Y. 2022. DINO: DETR with improved denoising anchor boxes for end-to-end object detection [EB/OL]. [2023-10-27]. https://arxiv.org/pdf/2203.03605.pdfhttps://arxiv.org/pdf/2203.03605.pdf
Zheng C, Chen P F, Pang J, Yang X F, Chen C X, Tu S Q and Xue Y J. 2021. A mango picking vision algorithm on instance segmentation and key point detection from RGB images in an open orchard. Biosystems Engineering, 206: 32-54 [DOI: 10.1016/j.biosyste-mseng.2021.03.012http://dx.doi.org/10.1016/j.biosyste-mseng.2021.03.012]
Zhu X Z, Su W J, Lu L W, Li B, Wang X G and Dai J F. 2021. Deformable DETR: deformable Transformers for end-to-end object detection [EB/OL]. [2023-10-27]. https://arxiv.org/pdf/2010.04159.pdfhttps://arxiv.org/pdf/2010.04159.pdf
相关文章
相关作者
相关机构