Fast convergence network for target posetracking driven by synthetic data

Peng Hong; Wang Qian; Jia Di; Zhao Jinyuan; Pang Yuheng

doi:10.11834/jig.230096

Image Analysis and Recognition | Views : 0 下载量: 3 CSCD: 0

PDF
Export
Share
Collection
Album

Fast convergence network for target posetracking driven by synthetic data
Vol. 29, Issue 1, Pages: 147-162(2024)
Published： 16 January 2024 ，
DOI： 10.11834/jig.230096
稿件说明：

移动端阅览

彭泓，王骞，贾迪，赵金源，庞宇恒. 2024. 合成数据驱动目标姿态追踪的快速收敛网络. 中国图象图形学报， 29(01):0147-0162

Peng Hong， Wang Qian， Jia Di， Zhao Jinyuan， Pang Yuheng. 2024. Fast convergence network for target posetracking driven by synthetic data. Journal of Image and Graphics， 29(01):0147-0162
彭泓，王骞，贾迪，赵金源，庞宇恒. 2024. 合成数据驱动目标姿态追踪的快速收敛网络. 中国图象图形学报， 29(01):0147-0162 DOI： 10.11834/jig.230096.

Peng Hong， Wang Qian， Jia Di， Zhao Jinyuan， Pang Yuheng. 2024. Fast convergence network for target posetracking driven by synthetic data. Journal of Image and Graphics， 29(01):0147-0162 DOI： 10.11834/jig.230096.

摘要

目的

受遮挡与累积误差因素的影响，现有目标6维（6 dimensions， 6D）姿态实时追踪方法在复杂场景中表现不佳。为此，提出了一种高鲁棒性的刚体目标6D姿态实时追踪网络。

方法

在网络的整体设计上，将当前帧彩色图像和深度图像（red green blue-depth map，RGB-D）与前一帧姿态估计结果经升维残差采样滤波和特征编码处理获得姿态差异，与前一帧姿态估计结果共同计算目标当前的6D姿态；在残差采样滤波模块的设计中，采用自门控swish（searching for activation functions）激活函数保留目标细节特征，提高目标姿态追踪的准确性；在特征聚合模块的设计中，将提取的特征分解为水平与垂直两个方向分量，分别从时间和空间上捕获长程依赖并保留位置信息，生成一组具有位置与时间感知的互补特征图，加强目标特征提取能力，从而加速网络收敛。

结果

实验选用YCB-Video（Yale-CMU-Berkeley-video）和YCBInEoAT（Yale-CMU-Berkeley in end-of-arm-tooling）数据集。实验结果表明，本文方法追踪速度达到90.9 Hz，追踪精度模型点平均距离（average distance of model points，ADD）和最近点的平均距离（average closest point distance，ADD-S）分别达到93.24及95.84，均高于同类相关方法。本文方法的追踪精度指标ADD和ADD-S在追踪精度和追踪速度上均领先于目前其他的刚体姿态追踪方法，与se（3）-TrackNet网络相比，本文方法在6 000组少量合成数据训练的条件下分别高出25.95和30.91，在8 000组少量合成数据训练的条件下分别高出31.72和28.75，在10 000组少量合成数据训练的条件下分别高出35.57和21.07，且在严重遮挡场景下能够实现对目标的高鲁棒6D姿态追踪。

结论

本文网络在合成数据驱动条件下，可以更好地完成实时准确追踪目标6D姿态，网络收敛速度快，实验结果验证了本文方法的有效性。

Abstract

Objective

Rigid object pose estimation is one of the fundamental， most challenging problems in computer vision， which has garnered substantial attention in recent years. Researchers are seeking methods to localize the multiple degrees of freedom of rigid objects in a 3D scene， such as position translation and directional rotation. At the same time， progress in the field of rigid object pose estimation has been considerable with the development of computer vision techniques. This task has become increasingly important in various applications， including robotics， space orbit servicing， autonomous driving， and augmented reality. Rigid object pose estimation can be divided into two stages： the traditional pose estimation stage （e.g.， feature-based， template matching， and 3D coordinate-based methods） and the deep learning-based pose estimation stage （e.g.， improved traditional methods and direct or indirect estimation methods）. Despite the achievement of high tracking accuracy by existing methods and their improved variants， the tracking precision substantially deteriorates when they are applied to new scenes or novel target objects， exhibiting poor performance in complex environments. In such cases， a large amount of training data is required for deep learning across multiple scenarios， incurring high costs for data collection and network training. To address this issue， this paper proposes a real-time tracking network for rigid object 6D pose with fast convergence and high robustness， driven by synthetic data. The network provides long-term stable 6D pose tracking for target rigid objects， greatly reducing the cost of data collection and the time required for network convergence.

Method

The network convergence speed is mainly improved by the overall design of the network， the residual sampling filtering module， and the characteristic aggregation module. The rigid 6D pose transformation is calculated using Lie algebra and Lie group theory. The current frame RGB-D image and the previous frame’s pose estimation result are transformed into a pair of 4D tensors and input into the network. The pose difference is obtained through residual sampling filtering processing and feature encoder， and the current 6D pose of the target is calculated jointly with the previous frame’s pose estimation. In the design of the residual sampling filtering module， the self-gated swish activation function is used to retain target detail features， and the displacement and rotation matrix is obtained by decoupling the target pose through feature encoding and decoder， which improves the accuracy of target pose tracking. In the design of the characteristic aggregation module， the features are decomposed into horizontal and vertical components， and a 1D feature encoding is obtained through aggregation， capturing long-term dependencies and preserving position information from time and space. A set of complementary feature maps with position and time awareness is generated to strengthen the target feature extraction ability， thereby accelerating the convergence of the network.

Result

To ensure consistent training and testing environments， all experiments are conducted on a desktop computer with an Intel Core i7-8700@3.2 GHz processor and NVIDIA RTX 3060 GPU. Each target in the complete dataset contains approximately 23 000 sets of images with a size of 176 × 176 pixels， totaling about 15 GB in capacity. During training and validation， the batch size is set to 80， and the model is trained for 300 epochs. The initial learning rate is set to 0.01， with decay rate parameters of 0.9 and 0.99 applied starting from the 100th and the 200th epochs， respectively. When evaluating the tracking performance， the average distance of model points （ADD） metric is commonly used to assess the accuracy of pose estimation for non-symmetric objects. This approach involves calculating the Euclidean distance between each predicted point and the corresponding ground truth point， followed by summing these distances and taking their average. However， the ADD metric is not suitable for evaluating symmetric objects because multiple correct poses may exist for a symmetric object in the same image. In such cases， the ADD-S metric is used， which projects the ground truth and predicted models onto the symmetry plane and calculates the average distance between the projected points. This metric is more appropriate for evaluating the pose tracking results of symmetric objects. The Yale-CMU-Berkeley-video （YCB-Video） dataset and Yale-CMU-Berkeley in end-of-arm-tooling （YCBInEoAT） dataset are used to evaluate the performance of relevant methods in the experiments. The YCB-Video dataset contains complex scenes captured by a moving camera under severe occlusion conditions， whereas the YCBInEoAT dataset involves tracking rigid objects with a robotic arm. These two datasets are utilized to validate the generality and robustness of the network across different scenarios. Experimental results show the tracking speed of the proposed method reaches 90.9 Hz， and the average distance of model points （ADD） and the average distance of nearest points （ADD-S） reach 93.24 and 95.84， respectively， which are higher than similar related methods. Compared with the se（3）-TrackNet method， which has the highest tracking accuracy， the ADD and ADD-S of this method are 25.95 and 30.91 higher under the condition of 6 000 sets of synthetic data， respectively， 31.72 and 28.75 higher under the condition of 8 000 sets of synthetic data， respectively， and 35.75 higher under the condition of 10 000 sets of synthetic data. The method achieves highly robust 6D pose tracking for targets in severely occluded scenes.

Conclusion

A novel fast-converging network is proposed for tracking the pose of rigid objects， which combines the residual sampling filtering module and the characteristic aggregation module. This network can provide long-term effective 6D pose tracking of objects with only one initialization. By utilizing a small amount of synthetic data， the network quickly reaches a state of convergence and achieves desirable performance in complex scenes， including severe occlusion and drastic displacement. The network demonstrates outstanding real-time pose tracking efficiency and tracking accuracy. Experimental results on different datasets validate the superiority and reliability of this approach. In future work， we will continue to optimize our model， further improve object tracking accuracy and network convergence speed， address the limitation of requiring computer-aided design （CAD） models for the network， and achieve category-level pose tracking.

关键词

6D姿态估计实时追踪合成数据图像处理特征融合

Keywords

6D pose estimationreal-time trackingsynthetic dataimage processingfeature fusion

references

Chen D S， Li J， Wang Z and Xu K. 2020. Learning canonical shape space for category-level 6D object pose and size estimation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 11970-11979 ［DOI： 10.1109/CVPR42600.2020.01199http://dx.doi.org/10.1109/CVPR42600.2020.01199］

Chen W， Jia X， Chang H J， Duan J M， Shen L L and Leonardis A. 2021. FS-Net： fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 1581-1590 ［DOI： 10.1109/CVPR46437.2021.00163http://dx.doi.org/10.1109/CVPR46437.2021.00163］

Choi C and Christensen H I. 2010. Real-time 3D model-based tracking using edge and keypoint features for robotic manipulation//Proceedings of 2010 IEEE International Conference on Robotics and Automation. Anchorage， USA： IEEE： 4048-4055 ［DOI： 10.1109/ROBOT.2010.5509171http://dx.doi.org/10.1109/ROBOT.2010.5509171］

Choi C and Christensen H I. 2012. 3D textureless object detection and tracking： an edge-based approach//Proceedings of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vilamoura-Algarve， Portugal： IEEE： 3877-3884 ［DOI： 10.1109/IROS.2012.6386065http://dx.doi.org/10.1109/IROS.2012.6386065］

Collet A， Martinez M and Srinivasa S S. 2011. The MOPED framework： object recognition and pose estimation for manipulation. The International Journal of Robotics Research， 30（10）： 1284-1306 ［DOI： 10.1177/0278364911401765http://dx.doi.org/10.1177/0278364911401765］

Deng X K， Mousavian A， Xiang Y， Xia F， Bretl T and Fox D. 2019. PoseRBPF： a rao-blackwellized particle filter for 6D object pose tracking//Proceedings of the 15th Robotics： Science and Systems. Freiburg im Breisgau， Germany： MIT： 49-59 ［DOI： 10.15607/RSS.2019.XV.049http://dx.doi.org/10.15607/RSS.2019.XV.049］

Deng X K， Xiang Y， Mousavian A， Eppner C， Brqetl T and Fox D. 2020. Self-supervised 6D object pose estimation for robot manipulation//Proceedings of 2020 IEEE International Conference on Robotics and Automation. Paris， France： IEEE： 3665-3671 ［DOI： 10.1109/ICRA40945.2020.9196714http://dx.doi.org/10.1109/ICRA40945.2020.9196714］

Dong Y C， Ji L L， Wang S B， Gong P， Yue J G， Shen R J， Chen C and Zhang Y P. 2021. Accurate 6DOF pose tracking for texture-less objects. IEEE Transactions on Circuits and Systems for Video Technology， 31（5）： 1834-1848 ［DOI： 10.1109/TCSVT.2020.3011737http://dx.doi.org/10.1109/TCSVT.2020.3011737］

Dosovitskiy A， Fischer P， llg E， Häusser P， Hazirbas C， Golkov V， van der Smagt P， Cremers D and Brox T. 2015. FlowNet： learning optical flow with convolutional networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago， Chile： IEEE： 2758-2766 ［DOI： 10.1109/ICCV.2015.316http://dx.doi.org/10.1109/ICCV.2015.316］

Drost B， Ulrich M， Navab N and Ilic S. 2010. Model globally， match locally： efficient and robust 3D object recognition//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco， USA： IEEE： 998-1005 ［DOI： 10.1109/CVPR.2010.5540108http://dx.doi.org/10.1109/CVPR.2010.5540108］

Engel J， Koltun V and Cremers D. 2018. Direct sparse odometry. IEEE transactions on Pattern Analysis and Machine Intelligence， 40（3）： 611-625 ［DOI： 10.1109/TPAMI.2017.2658577http://dx.doi.org/10.1109/TPAMI.2017.2658577］

Ge R D and Loianno G. 2021. VIPose： real-time visual-inertial 6D object pose tracking//Proceedings of 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems. Prague， Czech Republic： IEEE： 4597-4603 ［DOI： 10.1109/IROS51168.2021.9636283http://dx.doi.org/10.1109/IROS51168.2021.9636283］

Glorot X， Bordes A and Bengio Y. 2011. Deep sparse rectifier neural networks//Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale， USA： JMLR： 315-323

Guo J W， Xing X J， Quan W Z， Yan D M， Gu Q Y， Liu Y and Zhang X P. 2021. Efficient center voting for object detection and 6D pose estimation in 3D point cloud. IEEE Transactions on Image Processing， 30： 5072-5084 ［DOI： 10.1109/TIP.2021.3078109http://dx.doi.org/10.1109/TIP.2021.3078109］

He Y S， Sun W， Huang H B， Liu J R， Fan H Q and Sun J. 2020. PVN3D： a deep point-wise 3D keypoints voting network for 6DoF pose estimation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 11629-11638 ［DOI： 10.1109/CVPR42600.2020.01165http://dx.doi.org/10.1109/CVPR42600.2020.01165］

Hinterstoisser S， Cagniart C， Ilic S， Sturm P， Navab N， Fua P and Lepetit V. 2012. Gradient response maps for real-time detection of textureless objects. IEEE Transactions on Pattern Analysis and Machine Intelligence， 34（5）： 876-888 ［DOI： 10.1109/TPAMI.2011.206http://dx.doi.org/10.1109/TPAMI.2011.206］

Issac J， Wüthrich M， Cifuentes C G， Bohg J， Trimpe S and Schaal S. 2016. Depth-based object tracking using a robust Gaussian filter//Proceedings of 2016 IEEE International Conference on Robotics and Automation. Stockholm， Sweden： IEEE： 608-615 ［DOI： 10.1109/ICRA.2016.7487184http://dx.doi.org/10.1109/ICRA.2016.7487184］

Kehl W， Manhardt F， Tombari F， Ilic S and Navab N. 2017. SSD-6D： making RGB-based 3D detection and 6D pose estimation great again//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 1530-1538 ［DOI： 10.1109/ICCV.2017.169http://dx.doi.org/10.1109/ICCV.2017.169］

Li D D， Zheng H R， Liu F C and Pan X. 2022. 6D pose estimation based on mask location and hourglass network. Journal of Image and Graphics， 27（2）： 642-652

李冬冬，郑河荣，刘复昌，潘翔. 2022. 结合掩码定位和漏斗网络的6D姿态估计. 中国图象图形学报， 27（2）： 642-652 ［DOI： 10.11834/jig.200525http://dx.doi.org/10.11834/jig.200525］

Li Y， Wang G， Ji X Y， Xiang Y and Fox D. 2020. DeepIM： deep iterative matching for 6D pose estimation. International Journal of Computer Vision， 128（3）： 657-678 ［DOI： 10.1007/s11263-019-01250-9http://dx.doi.org/10.1007/s11263-019-01250-9］

Li Z G， Wang G and Ji X Y. 2019. CDPN： coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 7677-7686 ［DOI： 10.1109/ICCV.2019.00777http://dx.doi.org/10.1109/ICCV.2019.00777］

Liu J， Sun W， Liu C P， Zhang X， Fan S M and Wu W. 2022. HFF6D： hierarchical feature fusion network for robust 6D object pose tracking. IEEE Transactions on Circuits and Systems for Video Technology， 32（11）： 7719-7731 ［DOI： 10.1109/TCSVT.2022.3181597http://dx.doi.org/10.1109/TCSVT.2022.3181597］

Liu Y Y， Peng J Y， Dai W， Zeng J B and Shan S G. 2023a. Joint spatial and scale attention network for multi-view facial expression recognition. Pattern Recognition， 139： #109496 ［DOI： 10.1016/j.patcog.2023.109496http://dx.doi.org/10.1016/j.patcog.2023.109496］

Liu Y Y， Wang W B， Feng C X， Zhang H Y， Chen Z and Zhan Y B. 2023b. Expression snippet transformer for robust video-based facial expression recognition. Pattern Recognition， 138： #109368 ［DOI： 10.1016/j.patcog.2023.109368http://dx.doi.org/10.1016/j.patcog.2023.109368］

Liu Y Y， Zhou N， Zhang F Y， Wang W B， Wang Y， Liu K J and Liu Z Y. 2023c. APSL： action-positive separation learning for unsupervised temporal action localization. Information Sciences， 630： 206-221 ［DOI： 10.1016/j.ins.2023.02.047http://dx.doi.org/10.1016/j.ins.2023.02.047］

Marougkas I， Koutras P， Kardaris N， Retsinas G， Chalvatzaki G and Maragos P. 2020. How to track your dragon： a multi-attentional framework for real-time RGB-D 6-DOF object pose tracking//Proceedings of 2020 European Conference on Computer Vision. Glasgow， UK： Springer： 682-699 ［DOI： 10.1007/978-3-030-66096-3_45http://dx.doi.org/10.1007/978-3-030-66096-3_45］

Mitash C， Bekris K E and Boularias A. 2017. A self-supervised learning system for object detection using physics simulation and multi-view pose estimation//Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver， Canada： IEEE： 545-551 ［DOI： 10.1109/IROS.2017.8202206http://dx.doi.org/10.1109/IROS.2017.8202206］

Mitash C， Wen B W， Bekris K and Boularias A. 2019. Scene-level pose estimation for multiple instances of densely packed objects//Proceedings of 2019 Conference on Robot Learning. PMLR： 1133-1145

Pauwels K， Rubio L and Ros E. 2016. Real-time pose detection and tracking of hundreds of objects. IEEE Transactions on Circuits and Systems for Video Technology， 26（12）： 2200-2214 ［DOI： 10.1109/TCSVT.2015.2430652http://dx.doi.org/10.1109/TCSVT.2015.2430652］

Peng S D， Liu Y， Huang Q X， Zhou X W and Bao H J. 2019. PVNet： pixel-wise voting network for 6DoF pose estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 4556-4565 ［DOI： 10.1109/CVPR.2019.00469http://dx.doi.org/10.1109/CVPR.2019.00469］

Prisacariu V A and Reid I D. 2012. PWP3D： real-time segmentation and tracking of 3D objects. International Journal of Computer Vision， 98（3）： 335-354 ［DOI： 10.1007/s11263-011-0514-3http://dx.doi.org/10.1007/s11263-011-0514-3］

Ramachandran P， Zoph B and Le Q V. 2018. Searching for activation functions//Proceedings of the 6th International Conference on Learning Representations. Vancouver， Canada ［DOI： 10.48550/arXiv.1710.05941http://dx.doi.org/10.48550/arXiv.1710.05941］

Redmon J， Divvala S， Girshick R and Farhadi A. 2016. You only look once： unified， real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 779-788 ［DOI： 10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91］

Schmidt T， Newcombe R and Fox D. 2014. DART： dense articulated real-time tracking//Proceedings of 2014 Robotics： Science and systems. California： IEEE： 2（1）： 1-9

Sun X L， Zhou J X， Zhang W L， Wang Z and Yu Q F. 2021. Robust monocular pose tracking of less-distinct objects based on contour-part model. IEEE Transactions on Circuits and Systems for Video Technology， 31（11）： 4409-4421 ［DOI： 10.1109/TCSVT.2021.3053696http://dx.doi.org/10.1109/TCSVT.2021.3053696］

Sundermeyer M， Marton Z C， Durner M， Brucker M and Triebel R. 2018. Implicit 3D orientation learning for 6D object detection from RGB images//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 712-729 ［DOI： 10.1007/978-3-030-01231-1_43http://dx.doi.org/10.1007/978-3-030-01231-1_43］

Tekin B， Sinha S N and Fua P. 2018. Real-time seamless single shot 6D object pose prediction//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 292-301 ［DOI： 10.1109/CVPR.2018.00038http://dx.doi.org/10.1109/CVPR.2018.00038］

Tjaden H， Schwanecke U and Schömer E. 2016. Real-time monocular segmentation and pose tracking of multiple objects//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 423-438 ［DOI： 10.1007/978-3-319-46493-0_26http://dx.doi.org/10.1007/978-3-319-46493-0_26］

Tjaden H， Schwanecke U and Schömer E. 2017. Real-time monocular pose estimation of 3D objects using temporally consistent local color histograms//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 124-132 ［DOI： 10.1109/ICCV.2017.23http://dx.doi.org/10.1109/ICCV.2017.23］

Tobin J， Fong R， Ray A， Schneider J， Zaremba W and Abbeel P. 2017. Domain randomization for transferring deep neural networks from simulation to the real world//Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver， Canada： IEEE： 23-30 ［DOI： 10.1109/IROS.2017.8202133http://dx.doi.org/10.1109/IROS.2017.8202133］

Tremblay J， To T and Birchfield S. 2018a. Falling things： a synthetic dataset for 3D object detection and pose estimation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City， USA： IEEE： 2038-2041 ［DOI： 10.1109/CVPRW.2018.00275http://dx.doi.org/10.1109/CVPRW.2018.00275］

Tremblay J， To T， Sundaralingam B， Xiang Y， Fox D and Birchfield S. 2018b. Deep object pose estimation for semantic robotic grasping of household objects//Proceedings of the 2nd Conference on Robot Learning. Zurich， Switzerland： PMLR： 306-316

Wang C， Martín-Martín R， Xu D F， Lyu J， Lu C W， Li F F， Savarese S and Zhu Y K. 2020. 6-PACK： category-level 6D pose tracker with anchor-based keypoints//Proceedings of 2020 IEEE International Conference on Robotics and Automation. Paris， France： IEEE： 10059-10066 ［DOI： 10.1109/ICRA40945.2020.9196679http://dx.doi.org/10.1109/ICRA40945.2020.9196679］

Wang C， Xu D F， Zhu Y K， Martín-Martín R， Lu C W， Li F F and Savarese S. 2019a. DenseFusion： 6D object pose estimation by iterative dense fusion//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 3338-3347 ［DOI： 10.1109/CVPR.2019.00346http://dx.doi.org/10.1109/CVPR.2019.00346］

Wang H， Sridhar S， Huang J W， Valentin J， Song S and Guibas L J. 2019b. Normalized object coordinate space for category-level 6D object pose and size estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 2637-2646 ［DOI： 10.1109/CVPR.2019.00275http://dx.doi.org/10.1109/CVPR.2019.00275］

Wen B W and Bekris K. 2021. BundleTrack： 6D pose tracking for novel objects without instance or category-level 3D models//Proceedings of 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems. Prague， Czech Republic： IEEE： 8067-8074 ［DOI： 10.1109/IROS51168.2021.9635991http://dx.doi.org/10.1109/IROS51168.2021.9635991］

Wen B W， Mitash C， Ren B Z and Bekris K E. 2020a. se（3）-TrackNet： data-driven 6D pose tracking by calibrating image residuals in synthetic domains//Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. Las Vegas， USA： IEEE： 10367-10373 ［DOI： 10.1109/IROS45743.2020.9341314http://dx.doi.org/10.1109/IROS45743.2020.9341314］

Wen B W， Mitash C， Soorian S， Kimmel A， Sintov A and Bekris K E. 2020b. Robust， occlusion-aware pose estimation for objects grasped by adaptive hands//Proceedings of 2020 IEEE International Conference on Robotics and Automation. Paris， France： IEEE： 6210-6217 ［DOI： 10.1109/ICRA40945.2020.9197350http://dx.doi.org/10.1109/ICRA40945.2020.9197350］

Wüthrich M， Pastor P， Kalakrishnan M， Bohg J and Schaal S. 2013. Probabilistic object tracking using a range camera//Proceedings of 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. Tokyo， Japan： IEEE： 3195-3202 ［DOI： 10.1109/IROS.2013.6696810http://dx.doi.org/10.1109/IROS.2013.6696810］

Xiang Y， Schmidt T， Narayanan V and Fox D. 2018. PoseCNN： a convolutional neural network for 6D object pose estimation in cluttered scenes//Proceedings of 2018 Robotics： Science and Systems. Pittsburgh： 19-29 ［DOI： 10.15607/RSS.2018.XIV.019http://dx.doi.org/10.15607/RSS.2018.XIV.019］

Yang B Y， Du X P， Fang Y Q， Li P Y and Wang Y. 2021. Review of rigid object pose estimation from a single image. Journal of Image and Graphics， 26（2）： 334-354

杨步一，杜小平，方宇强，李佩阳，王阳. 2021. 单幅图像刚体目标姿态估计方法综述. 中国图象图形学报， 26（2）： 334-354 ［DOI： 10.11834/jig.200037http://dx.doi.org/10.11834/jig.200037］

Zhong L S， Lu M and Zhang L. 2018. A direct 3D object tracking method based on dynamic textured model rendering and extended dense feature fields. IEEE Transactions on Circuits and Systems for Video Technology， 28（9）： 2302-2315 ［DOI： 10.1109/TCSVT.2017.2731519http://dx.doi.org/10.1109/TCSVT.2017.2731519］

Zhou G L， Yan Y， Wang D M and Chen Q J. 2021. A novel depth and color feature fusion framework for 6D object pose estimation. IEEE Transactions on Multimedia， 23： 1630-1639 ［DOI： 10.1109/tmm.2020.3001533http://dx.doi.org/10.1109/tmm.2020.3001533］

Zhu X F， Wu X J， Xu T Y， Feng Z H and Kittler J. 2021. Complementary discriminative correlation filters based on collaborative representation for visual object tracking. IEEE Transactions on Circuits and Systems for Video Technology， 31（2）： 557-568 ［DOI： 10.1109/TCSVT.2020.2979480http://dx.doi.org/10.1109/TCSVT.2020.2979480］

Alert me when the article has been cited

提交

Real-time visual tracking via weighted multi-feature fusion on an appearance model

An iris feature-encoding method by fusion of graph neural networks and convolutional neural networks

Whole slide pathological image classification of breast cancer based on mixed supervision learning

Review of various vessels and airway segmentation in medical imaging

Infrared-visible image object detection algorithm using feature dynamic selection