杂乱场景下小物体抓取检测研究

孙国栋; 贾俊杰; 李明晶; 张杨

doi:10.11834/jig.230357

图像分析和识别 | 浏览量 : 0 下载量: 4 CSCD: 0

PDF
导出
分享
收藏
专辑

杂乱场景下小物体抓取检测研究
Small object grasping detection in cluttered scenes
2024年29卷第2期页码：468-477
纸质出版日期： 2024-02-16 ，
DOI： 10.11834/jig.230357
稿件说明：

移动端阅览

孙国栋，贾俊杰，李明晶，张杨. 2024. 杂乱场景下小物体抓取检测研究. 中国图象图形学报， 29(02):0468-0477

Sun Guodong， Jia Junjie， Li Mingjing， Zhang Yang. 2024. Small object grasping detection in cluttered scenes. Journal of Image and Graphics， 29(02):0468-0477
孙国栋，贾俊杰，李明晶，张杨. 2024. 杂乱场景下小物体抓取检测研究. 中国图象图形学报， 29(02):0468-0477 DOI： 10.11834/jig.230357.

Sun Guodong， Jia Junjie， Li Mingjing， Zhang Yang. 2024. Small object grasping detection in cluttered scenes. Journal of Image and Graphics， 29(02):0468-0477 DOI： 10.11834/jig.230357.

摘要

目的

杂乱场景下的物体抓取姿态检测是智能机器人的一项基本技能。尽管六自由度抓取学习取得了进展，但先前的方法在采样和学习中忽略了物体尺寸差异，导致在小物体上抓取表现较差。

方法

提出了一种物体掩码辅助采样方法，在所有物体上采样相同的点以平衡抓取分布，解决了采样点分布不均匀问题。此外，学习时采用多尺度学习策略，在物体部分点云上使用多尺度圆柱分组以提升局部几何表示能力，解决了由物体尺度差异导致的学习抓取操作参数困难问题。通过设计一个端到端的抓取网络，嵌入了提出的采样和学习方法，能够有效提升物体抓取检测性能。

结果

在大型基准数据集GraspNet-1Billion上进行评估，本文方法取得对比方法中的最优性能，其中在小物体上的抓取指标平均提升了7%，大量的真实机器人实验也表明该方法具有抓取未知物体的良好泛化性能。

结论

本文聚焦于小物体上的抓取，提出了一种掩码辅助采样方法嵌入到提出的端到端学习网络中，并引入了多尺度分组学习策略提高物体的局部几何表示，能够有效提升在小尺寸物体上的抓取质量，并在所有物体上的抓取评估结果都超过了对比方法。

Abstract

Objective

Object grasp pose detection in cluttered scenes is an essential skill for intelligent robots. Despite recent advances in six degrees-of-freedom grasping learning， learning the grasping configuration of small objects is extremely challenging. First， given the huge amount of raw point cloud data， points in the scene should be downsampled to reduce the computational complexity of the network and increase detection efficiency. Meanwhile， previous sampling methods sample fewer points on small objects， leading to difficulties in learning small object grasping poses. In addition， consumer-grade depth cameras currently available in the market are seriously noisy， particularly because the quality of point clouds obtained on small objects cannot be guaranteed， leading to the possibility of unclear objecthood of points on small objects predicted by the network. Some feasible grasping points are mistakenly regarded as background points， further reducing the number of sampling points on small objects， resulting in weak grasping performance on small objects.

Method

A potential problem in previous grasp detection methods is that they do not consider the biased distribution of sampling points due to differences in the scale of objects in the scene， resulting in fewer sampling points on small objects. In this study， we propose an object mask-assisted sampling method that samples the same points on all objects to balance grasping distribution， solving the problem of the uneven distribution of sampling points. In the inference， without a priori knowledge of scene point-level masks， we introduce an unseen object instance segmentation network to distinguish objects in the scenario， implementing a mask-assisted sampling method. In addition， a multi-scale learning strategy is used for learning， and multi-scale cylindrical grouping is used on the partial point clouds of objects to improve local geometric representation， solving the problem of difficulty in learning to grasp operational parameters caused by differences in object scales. In particular， we set up three cylinders with different radii to sample the point cloud near the graspable point， corresponding to learning large， medium， and small object features， and then splice the features of the three scales. Subsequently， we process the spliced features with a self-attention layer to enhance the attention of the local region and improve the local geometric representation of the object. Similar to GraspNet， we design an end-to-end grasping network that consists of three parts： graspable points， approach direction， and prediction of gripper operation. Graspable points represent the high-scoring points in the scene that are suitable for grasping. They can perform the initial filtering of a large amount of point cloud data in the scene and then embedded into the proposed sampling and learning methods to further predict the approach direction and gripper operation for grasping poses on an object. By designing an end-to-end grasping network embedded with the proposed sampling and learning approach， we can effectively improve object grasping detection capability.

Result

Finally， the proposed method achieves state-of-the-art performance when evaluated on the large benchmark dataset GraspNet-1Billion， wherein the grasping metrics on small objects are improved by 7% on average， and a large number of real robot experiments also show that the approach exhibits promising generalization performance on unseen objects. To more intuitively observe the improvement of the grasping performance of the proposed method on small objects， we also use the previous most representative method， i.e.， graspness-based sampling network（GSNet）， as the benchmark method and visualize the grasping detection results of the benchmark method and the proposed method in this study under four cluttered scenarios. The visualization results show that the previous method tends to predict grasping on large objects in the scene but does not show reasonable grasping poses on some small objects. By contrast， the proposed method can accurately predict grasping poses on small objects.

Conclusion

Focusing on grasping small objects， this study proposes a mask-assisted sampling method embedded into the proposed end-to-end learning network and introduces a multi-scale grouping learning strategy to improve the local geometric representation of objects， effectively improving the quality of grasping small objects and outperforming previous methods in the evaluation of grasping all objects. However， the proposed method has certain limitations. For example， when using noisy and low-quality depth maps as input， existing unseen object instance segmentation methods may produce incorrect object masks， failing in mask-assisted sampling. In the future， we plan to investigate more robust unseen object instance segmentation methods that can correct erroneous segmentation results under low-quality depth map input. This procedure will allow us to obtain more accurate object instance masks and enhance object grasping detection capability in cluttered scenes.

关键词

六自由度抓取采样策略多尺度学习点云学习深度学习

Keywords

six degrees-of-freedom graspingsampling strategymultiscale learningpoint cloud learningdeep learning

references

Chu F J， Xu R N and Vela P A. 2018. Real-world multiobject， multigrasp detection. IEEE Robotics and Automation Letters， 3（4）： 3355-3362 ［DOI： 10.1109/LRA.2018.2852777http://dx.doi.org/10.1109/LRA.2018.2852777］

Fang H S， Wang C X， Gou M H and Lu C W. 2020. GraspNet-1Billion： a large-scale benchmark for general object grasping//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 11441-11450 ［DOI： 10.1109/CVPR42600.2020.01146http://dx.doi.org/10.1109/CVPR42600.2020.01146］

Fischinger D， Weiss A and Vincze M. 2015. Learning grasps with topographic features. The International Journal of Robotics Research， 34（9）： 1167-1194 ［DOI： 10.1177/0278364915577105http://dx.doi.org/10.1177/0278364915577105］

Gonz􀅡lez Á. 2010. Measurement of areas on a sphere using Fibonacci and latitude-longitude lattices. Mathematical Geosciences， 42（1）： 49-64 ［DOI： 10.1007/s11004-009-9257-xhttp://dx.doi.org/10.1007/s11004-009-9257-x］

Gou M H， Fang H S， Zhu Z D， Xu S， Wang C X and Lu C W. 2021. RGB matters： learning 7-DoF grasp poses on monocular RGBD images//Proceedings of 2021 IEEE International Conference on Robotics and Automation （ICRA）. Xi’an， China： IEEE： 13459-13466 ［DOI： 10.1109/ICRA48506.2021.9561409http://dx.doi.org/10.1109/ICRA48506.2021.9561409］

Guo M H， Cai J X， Liu Z N， Mu T J， Martin R R and Hu S M. 2021. PCT： point cloud transformer. Computational Visual Media， 7（2）： 187-199 ［DOI： 10.1007/s41095-021-0229-5http://dx.doi.org/10.1007/s41095-021-0229-5］

Li Y Y， Bu R， Sun M C， Wu W， Di X H and Chen B Q. 2018. PointCNN： convolution on X-transformed points ［EB/OL］. ［2023-06-05］. https://arxiv.org/pdf/1801.07791.pdfhttps://arxiv.org/pdf/1801.07791.pdf

Liang H Z， Ma X J， Li S， Görner M， Tang S， Fang B， Sun F C and Zhang J W. 2019. PointNetGPD： detecting grasp configurations from point sets//Proceedings of 2019 International Conference on Robotics and Automation （ICRA）. Montreal， Canada： IEEE： 3629-3635 ［DOI： 10.1109/ICRA.2019.8794435http://dx.doi.org/10.1109/ICRA.2019.8794435］

Lu Y H， Deng B X， Wang Z Y， Zhi P Y， Li L L and Wang S J. 2022. Hybrid physical metric for 6-DoF grasp pose detection//Proceedings of 2022 International Conference on Robotics and Automation （ICRA）. Philadelphia， USA： IEEE： 8238-8244 ［DOI： 10.1109/ICRA46639.2022.9811961http://dx.doi.org/10.1109/ICRA46639.2022.9811961］

Ma H X and Huang D. 2022. Towards scale balanced 6-dof grasp detection in cluttered scenes ［EB/OL］. ［2023-06-05］. https://arxiv.org/pdf/2212.05275.pdfhttps://arxiv.org/pdf/2212.05275.pdf

Mahler J， Matl M， Satish V， Danielczuk M， DeRose B， McKinley S and Goldberg K. 2019. Learning ambidextrous robot grasping policies. Science Robotics， 4（26）： #4984 ［DOI： 10.1126/scirobotics.aau4984http://dx.doi.org/10.1126/scirobotics.aau4984］

Morrison D， Corke P and Leitner J. 2018. Closing the loop for robotic grasping： a real-time， generative grasp synthesis approach ［EB/OL］. ［2023-06-05］. https://arxiv.org/pdf/1804.05172.pdfhttps://arxiv.org/pdf/1804.05172.pdf

Mousavian A， Eppner C and Fox D. 2019. 6-DOF GraspNet： variational grasp generation for object manipulation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 2901-2910 ［DOI： 10.1109/ICCV.2019.00299http://dx.doi.org/10.1109/ICCV.2019.00299］

Nguyen V D. 1988. Constructing force-closure grasps. The International Journal of Robotics Research， 7（3）： 3-16 ［DOI： 10.1177/027836498800700301http://dx.doi.org/10.1177/027836498800700301］

Ni P Y， Zhang W G， Zhu X X and Cao Q X. 2020. PointNet++ grasping： learning an end-to-end spatial grasp generation algorithm from sparse point clouds//Proceedings of 2020 IEEE International Conference on Robotics and Automation （ICRA）. Paris， France： IEEE： 3619-3625 ［DOI： 10.1109/ICRA40945.2020.9196740http://dx.doi.org/10.1109/ICRA40945.2020.9196740］

Qi C R， Su H， Mo K C and Guibas L J. 2017a. PointNet： deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 77-85 ［DOI： 10.1109/CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16］

Qi C R， Yi L， Su H and Guibas L J. 2017b. PointNet++： deep hierarchical feature learning on point sets in a metric space//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 5105-5114

Rakotosaona M J， La Barbera V， Guerrero P， Mitra N J and Ovsjanikov M. 2020. PointCleanNet： learning to denoise and remove outliers from dense point clouds. Computer Graphics Forum， 39（1）： 185-203 ［DOI： 10.1111/cgf.13753http://dx.doi.org/10.1111/cgf.13753］

Sundermeyer M， Mousavian A， Triebel R and Fox D. 2021. Contact-GraspNet： efficient 6-DoF grasp generation in cluttered scenes//Proceedings of 2021 IEEE International Conference on Robotics and Automation. Xi’an， China： IEEE： 13438-13444 ［DOI： 10.1109/ICRA48506.2021.9561877http://dx.doi.org/10.1109/ICRA48506.2021.9561877］

Ten Pas A， Gualtieri M， Saenko K and Platt R. 2017. Grasp pose detection in point clouds. International Journal of Robotics Research， 36（13/14）： 1455-1473 ［DOI： 10.1177/0278364917735594http://dx.doi.org/10.1177/0278364917735594］

Wang C X， Fang H S， Gou M H， Fang H J， Gao J and Lu C W. 2021. Graspness discovery in clutters for fast and accurate grasp detection//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 15944-15953 ［DOI： 10.1109/ICCV48922.2021.01566http://dx.doi.org/10.1109/ICCV48922.2021.01566］

Wu W X， Qi Z G and Li F X. 2019. PointConv： deep convolutional networks on 3D point clouds//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 9613-9622 ［DOI： 10.1109/CVPR.2019.00985http://dx.doi.org/10.1109/CVPR.2019.00985］

Xiang Y， Xie C， Mousavian A and Fox D. 2021. Learning RGB-D feature embeddings for unseen object instance segmentation ［EB/OL］. ［2023-06-05］. https://arxiv.org/pdf/2007.15157.pdfhttps://arxiv.org/pdf/2007.15157.pdf

Xu R N， Chu F J and Vela P A. 2022. GKNet： grasp keypoint network for grasp candidates detection. The International Journal of Robotics Research， 41（4）： 361-389 ［DOI： 10.1177/02783649211069569http://dx.doi.org/10.1177/02783649211069569］

Yan M， Tao D P and Pu Y Y. 2022. Texture-less object detection method for industrial components picking system. Journal of Image and Graphics， 27（8）： 2418-2429

闫明，陶大鹏，普园媛. 2022. 面向工业零件分拣系统的低纹理目标检测. 中国图象图形学报， 27（8）： 2418-2429 ［DOI： 10.11834/jig.210088http://dx.doi.org/10.11834/jig.210088］

Yuan W T， Khot T， Held D， Mertz C and Hebert M. 2018. PCN： point completion network//Proceedings of 2018 International Conference on 3D Vision （3DV）. Verona， Italy： IEEE： 728-737 ［DOI： 10.1109/3DV.2018.00088http://dx.doi.org/10.1109/3DV.2018.00088］

Zhao H S， Jiang L， Jia J Y， Torr P and Koltun V. 2021. Point transformer//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 16239-16248 ［DOI： 10.1109/ICCV48922.2021.01595http://dx.doi.org/10.1109/ICCV48922.2021.01595］

Zhou Q Y， Park J and Koltun V. 2018. Open3D： modern library fora 3D data processing ［EB/OL］. ［2023-06-05］. https://arxiv.org/pdf/1801.09847.pdfhttps://arxiv.org/pdf/1801.09847.pdf

文章被引用时，请邮件提醒。

提交