二维人体姿态编解码方法综述：从解决歧义性问题的角度出发

喻莉; 杜聪炬; 闫增强; 赵慧娟; 何双江

doi:10.11834/jig.230648

综述 | 浏览量 : 0 下载量: 328 CSCD: 0

PDF
导出
分享
收藏
专辑

二维人体姿态编解码方法综述：从解决歧义性问题的角度出发
Review of 2D human pose encoding and decoding methods： from the perspective of ambiguity mitigation
2024年29卷第11期页码：3319-3344
收稿日期：2023-09-28，

修回日期：2024-01-18，

纸质出版日期：2024-11-16
DOI： 10.11834/jig.230648
稿件说明：

移动端阅览

喻莉，杜聪炬，闫增强，赵慧娟，何双江. 2024. 二维人体姿态编解码方法综述：从解决歧义性问题的角度出发. 中国图象图形学报， 29(11):3319-3344 DOI： 10.11834/jig.230648.

Yu Li， Du Congju， Yan Zengqiang， Zhao Huijuan， He Shuangjiang. 2024. Review of 2D human pose encoding and decoding methods： from the perspective of ambiguity mitigation. Journal of Image and Graphics， 29(11):3319-3344 DOI： 10.11834/jig.230648.

摘要

人体姿态估计在娱乐、健康、安全等领域为众多应用提供了关键技术支持。人体姿态编解码的目的在于从原始输入数据中提取特征，将其构建为更易处理和理解的中间表示形式，并从中恢复出可理解的人体姿态。然而，实际场景中受到光照、运动模糊、遮挡、复杂姿态、拍摄视角和图像分辨率等因素的影响，人体姿态估计常常受到分布歧义、尺度歧义和关联歧义等问题的困扰。因此，合理的编解码设计是解决人体姿态估计各类歧义性问题的关键。首先，对人体姿态建模方法进行介绍，其是实现人体姿态编解码的前提条件。然后，针对分布歧义问题，从基于分布约束、基于结构约束和基于迭代约束3个方面进行介绍；尺度歧义问题被划分为关键点尺度歧义和像素尺度歧义问题，并介绍与之相关的基于尺度表征、基于无偏变换和基于积分回归的方法；针对关联歧义问题，归纳包括基于图优化、基于肢体向量、基于实例中心和基于参考标签的4类人体姿态编解码方法。同时，对各方法的性能进行了总结分析。最后，对未来人体姿态编解码的研究方向进行了展望。

Abstract

Within the various subfields of computer vision， human pose estimation stands out as an interesting area of research. This estimation aims to precisely localize body parts or keypoints of the human instance from a given image or video and reconstruct the skeleton structure of the human body. Human pose estimation offers technical support for various applications， such as human pose tracking， human action recognition， person re-identification， human-object interactions， and person image generation. The uses of human pose estimation span across entertainment （such as virtual reality， augmented reality， and animation）， health （such as healthcare and sports）， and security （such as surveillance）. Consequently， high-performance and real-time human pose estimation have emerged as prominent focus areas in current computer vision research. Extensive research on human pose estimation methods has been conducted in recent years. A part of the research focuses on developing and refining high-performance or lightweight network architectures. Notable examples include Hourglass， SimpleBaseline， high resolution net（HRNet）， and Lite-HRNet. These architectures have found broad utility in various visual tasks， including object detection and instance segmentation. Another facet of research is dedicated to introducing innovative pose encoding and decoding schemes. These novel schemes are intended to construct accurate and robust human pose estimation models. The encoding and decoding processes for human pose estimation represent a pivotal stage in extracting features from the input data and translating this information into comprehensible human poses. The encoding process primarily involves extracting features from the initial input data and molding them into an intermediate representation. This intermediate form， which could be feature maps or latent vectors， simplifies processing and comprehension； the subsequent decoding process retrieves the ultimate human pose from this encoded structure. Despite the considerable progress made in current research on human pose estimation， the issue of ambiguity remains a major obstacle in real-world scenarios. Diverse poses might be mapped to similar or overlapping low-dimensional representations， primarily due to variables such as illumination， motion blur， occlusions， complex poses， perspective， and resolution. This approach leads to ambiguous and uncertain resultant poses， constituting the ambiguity challenge in human pose estimation. This challenge encompasses distributive， scale， and associative ambiguity. For example， in scenarios where a hand is obscured， the precise location of the wrist becomes uncertain， thus yielding distributive ambiguity. Second， the scale of the body in the image diminishes when the camera is positioned farther from the human instance， often making it difficult to ascertain the accurate scale without ample contextual details， leading to scale ambiguity. Third， precisely assigning the identified keypoints to corresponding human instances becomes intricate when two human instances obscure each other， thereby introducing associative ambiguity. The well-designed methods for encoding and decoding human poses enable the suitable modeling and solving of human pose estimation. These methods provide effective optimization objectives and feature representations for the model， allowing for the construction of highly reasonable and robust human pose estimation models. Therefore， investigating encoding and decoding for human pose estimation carries substantial importance for research. The majority of past review papers on human pose estimation have primarily focused on the design of network structures， while the ambiguity problem can markedly influence the performance of human pose estimation. The objective is to provide a summarized analysis of the current research on pose encoding and decoding methods. This analysis will encompass a thorough investigation of the inherent ambiguity challenge associated with human pose estimation. In this paper， human pose modeling techniques are first introduced， which directly impact the potential for expressive human pose representation. Second， the pose encoding and decoding methods are categorized into distributive， scale， and associative ambiguity. Three strategies are explored to address distributive ambiguity： distributive， structural， and iterative constraints. The scale ambiguity is further refined into the keypoint- and pixel-wise scale ambiguity problem. The former is mainly addressed through representative-based methods， and the latter can be solved using unbiased and integral-based methods. Possible approaches for associative ambiguity can be categorized into the following four groups： graph-， limb-， center-， and embedding-based methods. These diverse methods provide multiple potential solutions for dealing with associative ambiguity. A summary and performance comparison of the methods used for encoding and decoding human poses are provided to help understand the strengths and limitations of each approach. Finally， potential directions for future development are elucidated. This paper aims to establish a novel research trajectory for researchers： addressing the ambiguity problem in human pose estimation through encoding and decoding. The resolution of ambiguity challenges in human pose estimation is expected to broaden its potential applications.

关键词

Keywords

references

Andriluka M ， Pishchulin L ， Gehler P and Schiele B . 2014 . 2D human pose estimation： new benchmark and state of the art analysis // Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition . Columbus， USA ： IEEE： 3686 - 3693 ［ DOI： 10.1109/CVPR.2014.471 http://dx.doi.org/10.1109/CVPR.2014.471 ］

Andriluka M ， Roth S and Schiele B . 2009 . Pictorial structures revisited： people detection and articulated pose estimation // Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition . Miami， USA ： IEEE： 1014 - 1021 ［ DOI： 10.1109/CVPR.2009.5206754 http://dx.doi.org/10.1109/CVPR.2009.5206754 ］

Brasó G ， Kister N and Leal-Taixé L . 2021 . The center of attention： center-keypoint grouping via attention for multi-person pose estimation // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . Montreal， Canada ： IEEE： 11833 - 11843 ［ DOI： 10.1109/ICCV48922.2021.01164 http://dx.doi.org/10.1109/ICCV48922.2021.01164 ］

Brown T B ， Mann B ， Ryder N ， Subbiah M ， Kaplan J D ， Dhariwal P ， Neelakantan A ， Shyam P ， Sastry G ， Askell A ， Agarwal S ， Herbert-Voss A ， Krueger G ， Henighan T ， Child R ， Ramesh A ， Ziegler D M ， Wu J ， Winter C ， Hesse C ， Chen M ， Sigler E ， Litwin M ， Gray S ， Chess B ， Clark J ， Berner C ， McCandlish S ， Radford A ， Sutskever I and Amodei D . 2020 . Language models are few-shot learners // Proceedings of the 34th International Conference on Neural Information Processing Systems . Vancouver， Canada ： Curran Associates Inc.： 1877 - 1901

Bulat A and Tzimiropoulos G . 2016 . Human pose estimation via convolutional part heatmap regression // Proceedings of the 14th European Conference on Computer Vision—ECCV 2016 . Amsterdam， the Netherlands ： Springer： 717 - 732 ［ DOI： 10.1007/978-3-319-46478-7_44 http://dx.doi.org/10.1007/978-3-319-46478-7_44 ］

Cao Z ， Hidalgo G ， Simon T ， Wei S E and Sheikh Y . 2021 . OpenPose： realtime multi-person 2D pose estimation using part affinity fields . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 43 （ 1 ）： 172 - 186 ［ DOI： 10.1109/TPAMI.2019.2929257 http://dx.doi.org/10.1109/TPAMI.2019.2929257 ］

Cao Z ， Simon T ， Wei S E and Sheikh Y . 2017 . Realtime multi-person 2D pose estimation using part affinity fields // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Honolulu， USA ： IEEE： 7291 - 7299 ［ DOI： 10.1109/CVPR.2017.143 http://dx.doi.org/10.1109/CVPR.2017.143 ］

Carreira J ， Agrawal P ， Fragkiadaki K and Malik J . 2016 . Human pose estimation with iterative error feedback // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Las Vegas， USA ： IEEE： 4733 - 4742 ［ DOI： 10.1109/CVPR.2016.512 http://dx.doi.org/10.1109/CVPR.2016.512 ］

Chen Y L ， Wang Z C ， Peng Y X ， Zhang Z Q ， Yu G and Sun J . 2018 . Cascaded pyramid network for multi-person pose estimation // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 7103 - 7112 ［ DOI： 10.1109/CVPR.2018.00742 http://dx.doi.org/10.1109/CVPR.2018.00742 ］

Cheng B W ， Xiao B ， Wang J D ， Shi H H ， Huang T S and Zhang L . 2020 . HigherHRNet： scale-aware representation learning for bottom-up human pose estimation // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 5386 - 5395 ［ DOI： 10.1109/CVPR42600.2020.00543 http://dx.doi.org/10.1109/CVPR42600.2020.00543 ］

Dai Y ， Wang X H ， Gao L L ， Song J K and Shen H T . 2021 . RSGNet： relation based skeleton graph network for crowded scenes pose estimation . Proceedings of the AAAI Conference on Artificial Intelligence ， 35 （ 2 ）： 1193 - 1200 ［ DOI： 10.1609/aaai.v35i2.16206 http://dx.doi.org/10.1609/aaai.v35i2.16206 ］

De Brabandere B ， Jia X ， Tuytelaars T and Van Gool L . 2016 . Dynamic filter networks // Proceedings of the 30th International Conference on Neural Information Processing Systems . Barcelona， Spain ： Curran Associates Inc.： 667 - 675 ［ DOI： 10.5555/3157096.3157171 http://dx.doi.org/10.5555/3157096.3157171 ］

Du C J ， Yan Z Q ， Yu H ， Yu L and Xiong Z X . 2022a . Hierarchical associative encoding and decoding for bottom-up human pose estimation . IEEE Transactions on Circuits and Systems for Video Technology ， 33 （ 4 ）： 1762 - 1775 ［ DOI： 10.1109/TCSVT.2022.3215564 http://dx.doi.org/10.1109/TCSVT.2022.3215564 ］

Du C J ， Yu H and Yu L . 2022b . A scale-sensitive heatmap representation for multi-person pose estimation . IET Image Processing ， 16 （ 4 ）： 1194 - 1207 ［ DOI： 10.1049/ipr2.12404 http://dx.doi.org/10.1049/ipr2.12404 ］

Duan H D ， Lin K Y ， Jin S ， Liu W T ， Qian C and Ouyang W L . 2019 . TRB： a novel triplet representation for understanding 2D human body // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV） . Seoul， Korea（South）： IEEE： 9478 - 9487 ［ DOI： 10.1109/ICCV.2019.00957 http://dx.doi.org/10.1109/ICCV.2019.00957 ］

Eichner M and Ferrari V . 2009 . Better appearance models for pictorial structures // Proceedings of 2009 the British Machine Vision Conference . London， UK ： BMVA Press： 1 - 11 ［ DOI： 10.5244/C.23.3 http://dx.doi.org/10.5244/C.23.3 ］

Fan X C ， Zheng K ， Lin Y W and Wang S . 2015 . Combining local appearance and holistic view： dual-source deep neural networks for human pose estimation // Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Boston， USA ： IEEE： 1347 - 1355 ［ DOI： 10.1109/CVPR.2015.7298740 http://dx.doi.org/10.1109/CVPR.2015.7298740 ］

Fang H S ， Li J F ， Tang H Y ， Xu C ， Zhu H Y ， Xiu Y L ， Li Y L and Lu C W . 2023 . AlphaPose： whole-body regional multi-person pose estimation and tracking in real-time . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 45 （ 6 ）： 7157 - 7173 ［ DOI： 10.1109/TPAMI.2022.3222784 http://dx.doi.org/10.1109/TPAMI.2022.3222784 ］

Felzenszwalb P ， McAllester D and Ramanan D . 2008 . A discriminatively trained， multiscale， deformable part model // Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition . Anchorage， USA ： IEEE： 1 - 8 ［ DOI： 10.1109/CVPR.2008.4587597 http://dx.doi.org/10.1109/CVPR.2008.4587597 ］

Felzenszwalb P F and Huttenlocher D P . 2005 . Pictorial structures for object recognition . International Journal of Computer Vision ， 61 （ 1 ）： 55 - 79 ［ DOI： 10.1023/B：VISI.0000042934.15159.49 http://dx.doi.org/10.1023/B：VISI.0000042934.15159.49 ］

Fischler M A and Elschlager R A . 1973 . The representation and matching of pictorial structures . IEEE Transactions on Computers， C-22 （ 1 ）： 67 - 92 ［ DOI： 10.1109/T-C.1973.223602 http://dx.doi.org/10.1109/T-C.1973.223602 ］

Geng Z G ， Sun K ， Xiao B ， Zhang Z X and Wang J D . 2021 . Bottom-up human pose estimation via disentangled keypoint regression // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 14671 - 14681 ［ DOI： 10.1109/CVPR46437.2021.01444 http://dx.doi.org/10.1109/CVPR46437.2021.01444 ］

Geng Z G ， Wang C Y ， Wei Y X ， Liu Z ， Li H Q and Hu H . 2023 . Human pose as compositional tokens // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Vancouver， Canada ： IEEE： 660 - 671 ［ DOI： 10.1109/CVPR52729.2023.00071 http://dx.doi.org/10.1109/CVPR52729.2023.00071 ］

Gu K R ， Yang L L and Yao A . 2021 . Removing the bias of integral pose regression // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . Montreal， Canada ： IEEE： 11047 - 11056 ［ DOI： 10.1109/ICCV48922.2021.01088 http://dx.doi.org/10.1109/ICCV48922.2021.01088 ］

Gu K R ， Yang L L ， Mi M B and Yao A . 2023 . Bias-compensated integral regression for human pose estimation . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 45 （ 9 ）： 10687 - 10702 ［ DOI： 10.1109/TPAMI.2023.3264742 http://dx.doi.org/10.1109/TPAMI.2023.3264742 ］

He K M ， Gkioxari G ， Doll􀅡r P and Girshick R . 2017 . Mask R-CNN // Proceedings of 2017 IEEE International Conference on Computer Vision （ICCV） . Venice， Italy ： IEEE： 2980 - 2988 ［ DOI： 10.1109/ICCV.2017.322 http://dx.doi.org/10.1109/ICCV.2017.322 ］

Huang J J ， Zhu Z ， Guo F and Huang G . 2020 . The devil is in the details： delving into unbiased data processing for human pose estimation // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 5699 - 5708 ［ DOI： 10.1109/CVPR42600.2020.00574 http://dx.doi.org/10.1109/CVPR42600.2020.00574 ］

Insafutdinov E ， Pishchulin L ， Andres B ， Andriluka M and Schiele B . 2016 . Deepercut： a deeper， stronger， and faster multi-person pose estimation model // Proceedings of the 14th European Conference on Computer Vision—ECCV 2016 . Amsterdam， the Netherlands ： Springer： 34 - 50 ［ DOI： 10.1007/978-3-319-46466-4_3 http://dx.doi.org/10.1007/978-3-319-46466-4_3 ］

Jaderberg M ， Simonyan K ， Zisserman A and Kavukcuoglu K . 2015 . Spatial Transformer networks // Proceedings of the 28th International Conference on Neural Information Processing Systems . Montréal， Canada ： Curran Associates Inc.： 2017 - 2025 ［ DOI： 10.5555/2969442.2969465 http://dx.doi.org/10.5555/2969442.2969465 ］

Jang E ， Gu S X and Poole B . 2017 . Categorical reparameterization with Gumbel-Softmax ［EB/OL］. ［ 2023-11-01 ］. https://arxiv.org/pdf/1611.01144.pdf https://arxiv.org/pdf/1611.01144.pdf

Jin L ， Wang X J ， Nie X C ， Liu L Q ， Guo Y D and Zhao J . 2022 . Grouping by center： predicting centripetal offsets for the bottom-up human pose estimation . IEEE Transactions on Multimedia ， 25 ： 3364 - 3374 ［ DOI： 10.1109/TMM.2022.3159111 http://dx.doi.org/10.1109/TMM.2022.3159111 ］

Jin S ， Liu W T ， Xie E Z ， Wang W H ， Qian C ， Ouyang W L and Luo P . 2020 . Differentiable hierarchical graph grouping for multi-person pose estimation // Proceedings of the 16th European Conference on Computer Vision—ECCV 2020 . Glasgow， UK ： Springer： 718 - 734 ［ DOI： 10.1007/978-3-030-58571-6_42 http://dx.doi.org/10.1007/978-3-030-58571-6_42 ］

Kamel A ， Sheng B ， Li P ， Kim J and Feng D D . 2021 . Hybrid refinement-correction heatmaps for human pose estimation . IEEE Transactions on Multimedia ， 23 ： 1330 - 1342 ［ DOI： 10.1109/TMM.2020.2999181 http://dx.doi.org/10.1109/TMM.2020.2999181 ］

Kan Z H ， Chen S S ， Li Z and He Z H . 2022 . Self-constrained inference optimization on structural groups for human pose estimation // Proceedings of the 17th European Conference on Computer Vision—ECCV 2022 . Tel Aviv， Israel ： Springer： 729 - 745 ［ DOI： 10.1007/978-3-031-20065-6_42 http://dx.doi.org/10.1007/978-3-031-20065-6_42 ］

Kan Z H ， Chen S S ， Zhang C ， Tang Y S and He Z H . 2023 . Self-correctable and adaptable inference for generalizable human pose estimation // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Vancouver， Canada ： IEEE： 5537 - 5546 ［ DOI： 10.1109/CVPR52729.2023.00536 http://dx.doi.org/10.1109/CVPR52729.2023.00536 ］

Ke L P ， Chang M C ， Qi H G and Lyu S W . 2018 . Multi-scale structure-aware network for human pose estimation // Proceedings of the 15th European Conference on Computer Vision—ECCV 2018 . Munich， Germany ： Springer： 731 - 746 ［ DOI： 10.1007/978-3-030-01216-8_44 http://dx.doi.org/10.1007/978-3-030-01216-8_44 ］

Khirodkar R ， Chari V ， Agrawal A and Tyagi A . 2021 . Multi-instance pose networks： rethinking top-down pose estimation // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . Montreal， Canada ： IEEE： 3102 - 3111 ［ DOI： 10.1109/ICCV48922.2021.00311 http://dx.doi.org/10.1109/ICCV48922.2021.00311 ］

Kreiss S ， Bertoni L and Alahi A . 2019 . PifPaf： composite fields for human pose estimation // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 11969 - 11978 ［ DOI： 10.1109/CVPR.2019.01225 http://dx.doi.org/10.1109/CVPR.2019.01225 ］

Kuhn H W . 1955 . The Hungarian method for the assignment problem . Naval Research Logistics Quarterly ， 2 （ 1/2 ）： 83 - 97 ［ DOI： 10.1002/nav.3800020109 http://dx.doi.org/10.1002/nav.3800020109 ］

Law H and Deng J . 2018 . CornerNet： detecting objects as paired keypoints // Proceedings of Computer Vision—ECCV 2018： the 15th European Conference . Munich， Germany ： Springer： 765 - 781 ［ DOI： 10.1007/978-3-030-01264-9_45 http://dx.doi.org/10.1007/978-3-030-01264-9_45 ］

Li J ， Su W and Wang Z F . 2020 . Simple pose： rethinking and improving a bottom-up approach for multi-person pose estimation . Proceedings of the AAAI Conference on Artificial Intelligence ， 34 （ 7 ）： 11354 - 11361 ［ DOI： 10.1609/aaai.v34i07.6797 http://dx.doi.org/10.1609/aaai.v34i07.6797 ］

Li J F ， Bian S Y ， Zeng A L ， Wang C ， Pang B ， Liu W T and Lu C W . 2021a . Human pose regression with residual log-likelihood estimation // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . Montreal， Canada ： IEEE： 11005 - 11014 ［ DOI： 10.1109/ICCV48922.2021.01084 http://dx.doi.org/10.1109/ICCV48922.2021.01084 ］

Li J F ， Chen T ， Shi R Q ， Lou Y J ， Li Y L and Lu C W . 2021b . Localization with sampling-argmax// Proceedings of the 35th Conference on Neural Information Processing Systems （NeurIPS 2021 . Sydney， Australia： Curran Associates Inc .： 27236 - 27248

Li S J ， Liu Z Q and Chan A B . 2015 . Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network . International Journal of Computer Vision ， 113 （ 1 ）： 19 - 36 ［ DOI： 10.1007/s11263-014-0767-8 http://dx.doi.org/10.1007/s11263-014-0767-8 ］

Li S J ， Liu Z Q and Chan A B . 2014 . Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network // Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops . Columbus， USA ： IEEE： 488 - 495 ［ DOI： 10.1109/CVPRW.2014.78 http://dx.doi.org/10.1109/CVPRW.2014.78 ］

Li W B ， Wang Z C ， Yin B Y ， Peng Q X ， Du Y M ， Xiao T Z ， Yu G ， Lu H T ， Wei Y C and Sun J . 2019 . Rethinking on multi-stage networks for human pose estimation ［EB/OL］. ［ 2023-11-01 ］. https://arxiv.org/pdf/1901.00148.pdf https://arxiv.org/pdf/1901.00148.pdf

Li Y J ， Yang S ， Liu P D ， Zhang S K ， Wang Y X ， Wang Z C ， Yang W K and Xia S T . 2022 . SimCC： a simple coordinate classification perspective for human pose estimation // Proceedings of the 17th European Conference on Computer Vision—ECCV 2022 . Tel Aviv， Israel ： Springer： 89 - 106 ［ DOI： 10.1007/978-3-031-20068-7_6 http://dx.doi.org/10.1007/978-3-031-20068-7_6 ］

Lin T Y ， Maire M ， Belongie S ， Hays J ， Perona P ， Ramanan D ， Doll􀅡r P and Lawrence Zitnick C . 2014 . Microsoft COCO： common objects in context // Proceedings of the 13th European Conference on Computer Vision—ECCV 2014 . Zürich， Switzerland ： Springer： 740 - 755 ［ DOI： 10.1007/978-3-319-10602-1_48 http://dx.doi.org/10.1007/978-3-319-10602-1_48 ］

Liu H ， Liu T T ， Chen Y ， Zhang Z L and Li Y F . 2022 . EHPE： skeleton cues-based gaussian coordinate encoding for efficient human pose estimation . IEEE Transactions on Multimedia ： 1 - 12 ［ DOI： 10.1109/TMM.2022.3197364 http://dx.doi.org/10.1109/TMM.2022.3197364 ］

Lowe D G . 1999 . Object recognition from local scale-invariant features // Proceedings of the 7th IEEE International Conference on Computer Vision . Kerkyra， Greece ： IEEE： 1150 - 1157 ［ DOI： 10.1109/ICCV.1999.790410 http://dx.doi.org/10.1109/ICCV.1999.790410 ］

Luo Z X ， Wang Z C ， Huang Y ， Wang L ， Tan T N and Zhou E J . 2021 . Rethinking the heatmap regression for bottom-up human pose estimation // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， TN， USA ： IEEE： 13259 - 13268 ［ DOI： 10.1109/CVPR46437.2021.01306 http://dx.doi.org/10.1109/CVPR46437.2021.01306 ］

Luvizon D C ， Tabia H and Picard D . 2019 . Human pose regression by combining indirect part detection and contextual information . Computers and Graphics ， 85 ： 15 - 22 ［ DOI： 10.1016/j.cag.2019.09.002 http://dx.doi.org/10.1016/j.cag.2019.09.002 ］

Mao W A ， Tian Z ， Wang X L and Shen C H . 2021 . FCPose： fully convolutional multi-person pose estimation with dynamic instance-aware convolutions // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 9030 - 9039 ［ DOI： 10.1109/CVPR46437.2021.00892 http://dx.doi.org/10.1109/CVPR46437.2021.00892 ］

Mao W A ， Ge Y T ， Shen C H ， Tian Z ， Wang X L ， Wang Z B and Van Den Hengel A . 2022 . Poseur： direct human pose regression with transformers // Proceedings of the 17th European Conference on Computer Vision—ECCV 2022 . Tel Aviv， Israel ： Springer： 72 - 88 ［ DOI： 10.1007/978-3-031-20068-7_5 http://dx.doi.org/10.1007/978-3-031-20068-7_5 ］

McNally W ， Vats K ， Wong A and McPhee J . 2022 . Rethinking keypoint representations： modeling keypoints and poses as objects for multi-person human pose estimation // Proceedings of the 17th European Conference on Computer Vision—ECCV 2022 . Tel Aviv， Israel ： Springer： 37 - 54 ［ DOI： 10.1007/978-3-031-20068-7_3 http://dx.doi.org/10.1007/978-3-031-20068-7_3 ］

Mikolajczyk K and Schmid C . 2005 . A performance evaluation of local descriptors . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 27 （ 10 ）： 1615 - 1630 ［ DOI： 10.1109/TPAMI.2005.188 http://dx.doi.org/10.1109/TPAMI.2005.188 ］

Moon G ， Chang J Y and Lee K M . 2019 . PoseFix： model-agnostic general human pose refinement network // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 7773 - 7781 ［ DOI： 10.1109/CVPR.2019.00796 http://dx.doi.org/10.1109/CVPR.2019.00796 ］

Neumann L and Vedaldi A . 2018 . Tiny people pose // Proceedings of the 14th Asian Conference on Computer Vision on Computer Vision-ACCV 2018 . Perth， Australia ： Springer： 558 - 574 ［ DOI： 10.1007/978-3-030-20893-6_35 http://dx.doi.org/10.1007/978-3-030-20893-6_35 ］

Newell A ， Huang Z A and Deng J . 2017 . Associative embedding ： end-to-end learning for joint detection and grouping// Proceedings of the 31st Conference on Neural Information Processing Systems （NIPS 2017 . LongBeach， USA： Curran Associates Inc .： 2277 - 2287

Newell A ， Yang K Y and Deng J . 2016 . Stacked hourglass networks for human pose estimation // Proceedings of the 14th European Conference on Computer Vision—ECCV 2016 . Amsterdam， the Netherlands ： Springer： 483 - 499 ［ DOI： 10.1007/978-3-319-46484-8_29 http://dx.doi.org/10.1007/978-3-319-46484-8_29 ］

Nibali A ， He Z ， Morgan S and Prendergast L . 2018 . Numerical coordinate regression with convolutional neural networks ［EB/OL］. ［ 2023-09-04 ］. https://arxiv.org/pdf/1801.07372.pdf https://arxiv.org/pdf/1801.07372.pdf

Nie X C ， Feng J S ， Xing J L ， Xiao S T and Yan S C . 2018a . Hierarchical contextual refinement networks for human pose estimation . IEEE Transactions on Image Processing ， 28 （ 2 ）： 924 - 936 ［ DOI： 10.1109/TIP.2018.2872628 http://dx.doi.org/10.1109/TIP.2018.2872628 ］

Nie X C ， Feng J S ， Xing J L and Yan S C . 2018b . Pose partition networks for multi-person pose estimation // Proceedings of the 15th European Conference on Computer Vision—ECCV 2018 . Munich， Germany ： Springer： 684 - 699 ［ DOI： 10.1007/978-3-030-01228-1_42 http://dx.doi.org/10.1007/978-3-030-01228-1_42 ］

Nie X C ， Feng J S ， Zhang J F and Yan S C . 2019 . Single-stage multi-person pose machines // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV） . Seoul， Korea（South）： IEEE： 6950 - 6959 ［ DOI： 10.1109/ICCV.2019.00705 http://dx.doi.org/10.1109/ICCV.2019.00705 ］

Ning G H ， Zhang Z and He Z Q . 2018 . Knowledge-guided deep fractal neural networks for human pose estimation . IEEE Transactions on Multimedia ， 20 （ 5 ）： 1246 - 1259 ［ DOI： 10.1109/TMM.2017.2762010 http://dx.doi.org/10.1109/TMM.2017.2762010 ］

Papandreou G ， Zhu T ， Chen L C ， Gidaris S ， Tompson J and Murphy K . 2018 . PersonLab： person pose estimation and instance segmentation with a bottom-up， part-based， geometric embedding model // Proceedings of the 15th European Conference on Computer Vision—ECCV 2018 . Munich， Germany ： Springer： 282 - 299 ［ DOI： 10.1007/978-3-030-01264-9_17 http://dx.doi.org/10.1007/978-3-030-01264-9_17 ］

Papandreou G ， Zhu T ， Kanazawa N ， Toshev A ， Tompson J ， Bregler C and Murphy K . 2017 . Towards accurate multi-person pose estimation in the wild // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Honolulu， USA ： IEEE： 3711 - 3719 ［ DOI： 10.1109/CVPR.2017.395 http://dx.doi.org/10.1109/CVPR.2017.395 ］

Pishchulin L ， Insafutdinov E ， Tang S Y ， Andres B ， Andriluka M ， Gehler P and Schiele B . 2016 . DeepCut： joint subset partition and labeling for multi person pose estimation // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Las Vegas， USA ： IEEE： 4929 - 4937 ［ DOI： 10.1109/CVPR.2016.533 http://dx.doi.org/10.1109/CVPR.2016.533 ］

Qiu L T ， Zhang X Y ， Li Y R ， Li G B ， Wu X J ， Xiong Z X ， Han X G and Cui S G . 2020a . Peeking into occluded joints： a novel framework for crowd pose estimation // Proceedings of the 16th European Conference on Computer Vision—ECCV 2020 . Glasgow， UK ： Springer： 488 - 504 ［ DOI： 10.1007/978-3-030-58529-7_29 http://dx.doi.org/10.1007/978-3-030-58529-7_29 ］

Qiu Z W ， Qiu K ， Fu J L and Fu D M . 2020b . DGCN： dynamic graph convolutional network for efficient multi-person pose estimation . Proceedings of the AAAI Conference on Artificial Intelligence ， 34 （ 7 ）： 11924 - 11931 ［ DOI： 10.1609/aaai.v34i07.6867 http://dx.doi.org/10.1609/aaai.v34i07.6867 ］

Qu H X ， Cai Y J ， Foo L G ， Kumar A and Liu J . 2023 . A characteristic function-based method for bottom-up human pose estimation // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Vancouver， Canada ： IEEE： 13009 - 13018 ［ DOI： 10.1109/CVPR52729.2023.01250 http://dx.doi.org/10.1109/CVPR52729.2023.01250 ］

Qu H X ， Xu L ， Cai Y J ， Foo L G and Liu J . 2022 . Heatmap distribution matching for human pose estimation // Advances in Neural Information Processing Systems 35 （NeurIPS 2022 . OrleansNew， USA： Curran Associates Inc.： 24327 - 24339

Ronchi M R and Perona P . 2017 . Benchmarking and error diagnosis in multi-instance pose estimation // Proceedings of 2017 IEEE International Conference on Computer Vision （ICCV） . Venice， Italy ： IEEE： 369 - 378 ［ DOI： 10.1109/ICCV.2017.48 http://dx.doi.org/10.1109/ICCV.2017.48 ］

Sapp B and Taskar B . 2013 . MODEC： multimodal decomposable models for human pose estimation // Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition . Portland， USA ： IEEE： 3674 - 3681 ［ DOI： 10.1109/CVPR.2013.471 http://dx.doi.org/10.1109/CVPR.2013.471 ］

Sapp B ， Weiss D and Taskar B . 2011 . Parsing human motion with stretchable models // Proceedings of 2011 Conference of Computer Vision and Pattern Recognition （CVPR 2011） . Colorado Springs， USA ： IEEE： 1281 - 1288 ［ DOI： 10.1109/CVPR.2011.5995607 http://dx.doi.org/10.1109/CVPR.2011.5995607 ］

Shi D H ， Wei X ， Li L Q ， Ren Y and Tan W M . 2022 . End-to-end multi-person pose estimation with Transformers // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . New Orleans， USA ： IEEE： 11069 - 11078 ［ DOI： 10.1109/CVPR52688.2022.01079 http://dx.doi.org/10.1109/CVPR52688.2022.01079 ］

Shi D H ， Wei X ， Yu X D ， Tan W M ， Ren Y and Pu S L . 2021 . InsPose： instance-aware networks for single-stage multi-person pose estimation // Proceedings of the 29th ACM International Conference on Multimedia . New York， USA ： Association for Computing Machinery： 3079 - 3087 ［ DOI： 10.1145/3474085.3475447 http://dx.doi.org/10.1145/3474085.3475447 ］

Sun K ， Xiao B ， Liu D and Wang J D . 2019 . Deep high-resolution representation learning for human pose estimation // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 5686 - 5696 ［ DOI： 10.1109/CVPR.2019.00584 http://dx.doi.org/10.1109/CVPR.2019.00584 ］

Sun X ， Shang J X ， Liang S and Wei Y C . 2017 . Compositional human pose regression // Proceedings of 2017 IEEE International Conference on Computer Vision （ICCV） . Venice， Italy ： IEEE： 2621 - 2630 ［ DOI： 10.1109/ICCV.2017.284 http://dx.doi.org/10.1109/ICCV.2017.284 ］

Sun X ， Xiao B ， Wei F Y ， Liang S and Wei Y C . 2018 . Integral human pose regression // Proceedings of the 15th European Conference on Computer Vision—ECCV 2018 . Munich， Germany ： Springer： 539 - 553 ［ DOI： 10.1007/978-3-030-01231-1_33 http://dx.doi.org/10.1007/978-3-030-01231-1_33 ］

Tang W and Wu Y . 2019 . Does learning specific features for related parts help human pose estimation? // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 1107 - 1116 ［ DOI： 10.1109/CVPR.2019.00120 http://dx.doi.org/10.1109/CVPR.2019.00120 ］

Tang W ， Yu P and Wu Y . 2018 . Deeply learned compositional models for human pose estimation // Proceedings of the 15th European Conference on Computer Vision—ECCV 2018 . Munich， Germany ： Springer： 197 - 214 ［ DOI： 10.1007/978-3-030-01219-9_12 http://dx.doi.org/10.1007/978-3-030-01219-9_12 ］

Tian Y D ， Lawrence Zitnick C and Narasimhan S G . 2012 . Exploring the spatial hierarchy of mixture models for human pose estimation // Proceedings of the 12th European Conference on Computer Vision on Computer Vision—ECCV 2012 . Florence， Italy ： Springer： 256 - 269 ［ DOI： 10.1007/978-3-642-33715-4_19 http://dx.doi.org/10.1007/978-3-642-33715-4_19 ］

Tian Z ， Shen C H ， Chen H and He T . 2019 . FCOS： fully convolutional one-stage object detection // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV） . Seoul， Korea （South）： IEEE： 9626 - 9635 ［ DOI： 10.1109/ICCV.2019.00972 http://dx.doi.org/10.1109/ICCV.2019.00972 ］

Tompson J ， Goroshin R ， Jain A ， LeCun Y and Bregler C . 2015 . Efficient object localization using convolutional networks // Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Boston， USA ： IEEE： 648 - 656 ［ DOI： 10.1109/CVPR.2015.7298664 http://dx.doi.org/10.1109/CVPR.2015.7298664 ］

Tompson J ， Jain A ， LeCun Y and Bregler C . 2014 . Joint training of a convolutional network and a graphical model for human pose estimation // Proceedings of the 27th International Conference on Neural Information Processing Systems . Montréal， Canada ： MIT Press： 1799 - 1807 ［ DOI： 10.5555/2968826.2969027 http://dx.doi.org/10.5555/2968826.2969027 ］

Toshev A and Szegedy C . 2014 . DeepPose： human pose estimation via deep neural networks // Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition . Columbus， USA ： IEEE： 1653 - 1660 ［ DOI： 10.1109/CVPR.2014.214 http://dx.doi.org/10.1109/CVPR.2014.214 ］

Van Den Oord A ， Vinyals O and Kavukcuoglu K . 2017 . Neural discrete representation learning // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach， USA ： Curran Associates Inc.： 6309 - 6318 ［ DOI： 10.5555/3295222.3295378 http://dx.doi.org/10.5555/3295222.3295378 ］

Varamesh A and Tuytelaars T . 2020 . Mixture dense regression for object detection and human pose estimation // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 13083 - 13092 ［ DOI： 10.1109/CVPR42600.2020.01310 http://dx.doi.org/10.1109/CVPR42600.2020.01310 ］

Wang C ， Zhang F ， Zhu X T and Ge S S . 2022 . Low-resolution human pose estimation . Pattern Recognition ， 126 ： # 108579 ［ DOI： 10.1016/j.patcog.2022.108579 http://dx.doi.org/10.1016/j.patcog.2022.108579 ］

Wang D K and Zhang S L . 2022 . Contextual instance decoupling for robust multi-person pose estimation // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . New Orleans， USA ： IEEE： 11050 - 11058 ［ DOI： 10.1109/CVPR52688.2022.01078 http://dx.doi.org/10.1109/CVPR52688.2022.01078 ］

Wang D K ， Zhang S L and Hua G . 2021a . Robust pose estimation in crowded scenes with direct pose-level inference // Advances in Neural Information Processing Systems 34 （NeurIPS 2021）. USA： Curran Associates Inc.： 6278 - 6289

Wang J ， Long X ， Gao Y ， Ding E R and Wen S L . 2020 . Graph-PCNN： two stage human pose estimation with graph pose refinement // Proceedings of the 16th European Conference on Computer Vision—ECCV 2020 . Glasgow， UK ： Springer： 492 - 508 ［ DOI： 10.1007/978-3-030-58621-8_29 http://dx.doi.org/10.1007/978-3-030-58621-8_29 ］

Wang J D ， Sun K ， Cheng T H ， Jiang B R ， Deng C R ， Zhao Y ， Liu D ， Mu Y D ， Tan M K ， Wang X G ， Liu W Y and Xiao B . 2021b . Deep high-resolution representation learning for visual recognition . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 43 （ 10 ）： 3349 - 3364 ［ DOI： 10.1109/TPAMI.2020.2983686 http://dx.doi.org/10.1109/TPAMI.2020.2983686 ］

Wang S F ， Ihler A ， Kording K and Yarkony J . 2018 . Accelerating dynamic programs via nested benders decomposition with application to multi-person pose estimation // Proceedings of the 15th European Conference on Computer Vision—ECCV 2018 . Munich， Germany ： Springer： 677 - 692 ［ DOI： 10.1007/978-3-030-01264-9_40 http://dx.doi.org/10.1007/978-3-030-01264-9_40 ］

Wei F Y ， Sun X ， Li H Y ， Wang J D and Lin S . 2020 . Point-set anchors for object detection， instance segmentation and pose estimation // Proceedings of the 16th European Conference on Computer Vision—ECCV 2020 . Glasgow， UK ： Springer： 527 - 544 ［ DOI： 10.1007/978-3-030-58607-2_31 http://dx.doi.org/10.1007/978-3-030-58607-2_31 ］

Xiang L H ， Li J and Wang Z F . 2022 . Least-squares estimation of keypoint coordinate for human pose estimation // Proceedings of the 5th Chinese Conference on Pattern Recognition and Computer Vision . Shenzhen， China ： Springer： 448 - 460 ［ DOI： 10.1007/978-3-031-18913-5_35 http://dx.doi.org/10.1007/978-3-031-18913-5_35 ］

Xiao B ， Wu H P and Wei Y C . 2018 . Simple baselines for human pose estimation and tracking // Proceedings of the 15th European Conference on Computer Vision—ECCV 2018 . Munich， Germany ： Springer： 472 - 487 ［ DOI： 10.1007/978-3-030-01231-1_29 http://dx.doi.org/10.1007/978-3-030-01231-1_29 ］

Xiao Y B ， Su K H ， Wang X J ， Yu D D ， Jin L ， He M S and Yuan Z H . 2022a . QueryPose： sparse multi-person pose regression via spatial-aware part-level query // Advances in Neural Information Processing Systems 35 （NeurIPS 2022 . OrleansNew， USA： Curran Associates Inc.： 12464 - 12477

Xiao Y B ， Wang X J ， Yu D D ， Wang G L ， Zhang Q and He M S . 2022b . Adaptivepose： human parts as adaptive points . Proceedings of the AAAI Conference on Artificial Intelligence ， 36 （ 3 ）： 2813 - 2821 ［ DOI： 10.1609/aaai.v36i3.20185 http://dx.doi.org/10.1609/aaai.v36i3.20185 ］

Xiao Y B ， Yu D D ， Wang X J ， Jin L ， Wang G L and Zhang Q . 2022c . Learning quality-aware representation for multi-person pose regression . Proceedings of the AAAI Conference on Artificial Intelligence ， 36 （ 3 ）： 2822 - 2830 ［ DOI： 10.1609/aaai.v36i3.20186 http://dx.doi.org/10.1609/aaai.v36i3.20186 ］

Xu X X ， Zou Q and Lin X . 2022 . Adaptive hypergraph neural network for multi-person pose estimation . Proceedings of the AAAI Conference on Artificial Intelligence ， 36 （ 3 ）： 2955 - 2963 ［ DOI： 10.1609/aaai.v36i3.20201 http://dx.doi.org/10.1609/aaai.v36i3.20201 ］

Xue N ， Wu T F ， Xia G S and Zhang L P . 2022 . Learning local-global contextual adaptation for multi-person pose estimation // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . New Orleans， USA ： IEEE： 13055 - 13064 ［ DOI： 10.1109/CVPR52688.2022.01272 http://dx.doi.org/10.1109/CVPR52688.2022.01272 ］

Yang J ， Zeng A L ， Liu S L ， Li F ， Zhang R M and Zhang L . 2023a . Explicit box detection unifies end-to-end multi-person pose estimation ［EB/OL］. ［ 2023-11-01 ］. https://arxiv.org/pdf/2302.01593.pdf https://arxiv.org/pdf/2302.01593.pdf

Yang S ， Feng Z ， Wang Z C ， Li Y J ， Zhang S K ， Quan Z B ， Xia S T and Yang W K . 2023b . Detecting and grouping keypoints for multi-person pose estimation using instance-aware attention . Pattern Recognition ， 136 ： # 109232 ［ DOI： 10.1016/j.patcog.2022.109232 http://dx.doi.org/10.1016/j.patcog.2022.109232 ］

Yang Y and Ramanan D . 2011 . Articulated pose estimation with flexible mixtures-of-parts // Proceedings of 2011 Conference of Computer vision and Pattern Recognition （CVPR 2011） . Colorado Springs， USA ： IEEE： 1385 - 1392 ［ DOI： 10.1109/CVPR.2011.5995741 http://dx.doi.org/10.1109/CVPR.2011.5995741 ］

Yang Y and Ramanan D . 2012 . Articulated human detection with flexible mixtures of parts . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 35 （ 12 ）： 2878 - 2890 ［ DOI： 10.1109/TPAMI.2012.261 http://dx.doi.org/10.1109/TPAMI.2012.261 ］

Ye S H ， Zhang Y Y ， Hu J ， Cao L J ， Zhang S C ， Shen L ， Wang J ， Ding S H and Ji R R . 2023 . DistilPose： tokenized pose regression with heatmap distillation // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Vancouver， Canada ： IEEE： 2163 - 2172 ［ DOI： 10.1109/CVPR52729.2023.00215 http://dx.doi.org/10.1109/CVPR52729.2023.00215 ］

Yu C Q ， Xiao B ， Gao C X ， Yuan L ， Zhang L ， Sang N and Wang J D . 2021 . Lite-HRnet： a lightweight high-resolution network // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 10435 - 10445 ［ DOI： 10.1109/CVPR46437.2021.01030 http://dx.doi.org/10.1109/CVPR46437.2021.01030 ］

Yu F ， Koltun V . 2016 . Multi-scale context aggregation by dilated convolutions ［EB/OL］. ［ 2023-11-01 ］. https://arxiv.org/pdf/1511.07122.pdf https://arxiv.org/pdf/1511.07122.pdf .

Yu H ， Du C J and Yu L . 2022 . Scale-aware heatmap representation for human pose estimation . Pattern Recognition Letters ， 154 ： 1 - 6 ［ DOI： 10.1016/j.patrec.2021.12.018 http://dx.doi.org/10.1016/j.patrec.2021.12.018 ］

Zatsiorsky V M . Kinetics of Human Motion . Human Kinetics ， 2002 . Champaign County， USA.

Zhang F ， Zhu X T ， Dai H B ， Ye M and Zhu C . 2020 . Distribution-aware coordinate representation for human pose estimation // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 7091 - 7100 ［ DOI： 10.1109/CVPR42600.2020.00712 http://dx.doi.org/10.1109/CVPR42600.2020.00712 ］

Zhang J ， Chen Z and Tao D C . 2021a . Towards high performance human keypoint detection . International Journal of Computer Vision ， 129 （ 9 ）： 2639 - 2662 ［ DOI： 10.1007/s11263-021-01482-8 http://dx.doi.org/10.1007/s11263-021-01482-8 ］

Zhang J B ， Zhu Z ， Lu J W ， Huang J J ， Huang G and Zhou J . 2021b . SIMPLE： single-network with mimicking and point learning for bottom-up human pose estimation . Proceedings of the AAAI Conference on Artificial Intelligence ， 35 （ 4 ）： 3342 - 3350 ［ DOI： 10.1609/aaai.v35i4.16446 http://dx.doi.org/10.1609/aaai.v35i4.16446 ］

Zhao L ， Xu J ， Gong C ， Yang J ， Zuo W M and Gao X B . 2020 . Learning to acquire the quality of human pose estimation . IEEE Transactions on Circuits and Systems for Video Technology ， 31 （ 4 ）： 1555 - 1568 ［ DOI： 10.1109/TCSVT.2020.3005522 http://dx.doi.org/10.1109/TCSVT.2020.3005522 ］

Zhou L ， Chen Y Y ， Wang J Q and Lu H Q . 2020 ， Progressive Bi-C3D pose grammar for human pose estimation // Proceedings of the AAAI Conference on Artificial Intelligence . New York， USA ： 13033 - 13040 ［ DOI： https://doi.org/10.1609/aaai.v34i07.7004 https://doi.org/10.1609/aaai.v34i07.7004 ］

Zhou X Y ， Wang D Q and Krähenbühl P . 2019 . Objects as points ［EB/OL］. ［ 2023-09-04 ］. https://arxiv.org/pdf/1904.07850.pdf https://arxiv.org/pdf/1904.07850.pdf

Zuffi S ， Freifeld O and Black M J . 2012 . From pictorial structures to deformable structures // Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition . Providence， RI， USA ： IEEE： 3546 - 3553 ［ DOI： 10.1109/CVPR.2012.6248098 http://dx.doi.org/10.1109/CVPR.2012.6248098 ］

文章被引用时，请邮件提醒。

提交

深度学习二维人体姿态估计方法综述

跨阶段结构下的人体姿态估计

高光谱图像智能分类研究综述与展望

走向通用行人重识别：预训练大模型技术在行人重识别的应用综述

针对视觉深度学习模型的物理对抗攻击研究综述