二维人体姿态编解码方法综述:从解决歧义性问题的角度出发
Review of 2D human pose encoding and decoding methods: from the perspective of ambiguity mitigation
- 2024年29卷第11期 页码:3319-3344
纸质出版日期: 2024-11-16
DOI: 10.11834/jig.230648
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-11-16 ,
移动端阅览
喻莉, 杜聪炬, 闫增强, 赵慧娟, 何双江. 2024. 二维人体姿态编解码方法综述:从解决歧义性问题的角度出发. 中国图象图形学报, 29(11):3319-3344
Yu Li, Du Congju, Yan Zengqiang, Zhao Huijuan, He Shuangjiang. 2024. Review of 2D human pose encoding and decoding methods: from the perspective of ambiguity mitigation. Journal of Image and Graphics, 29(11):3319-3344
人体姿态估计在娱乐、健康、安全等领域为众多应用提供了关键技术支持。人体姿态编解码的目的在于从原始输入数据中提取特征,将其构建为更易处理和理解的中间表示形式,并从中恢复出可理解的人体姿态。然而,实际场景中受到光照、运动模糊、遮挡、复杂姿态、拍摄视角和图像分辨率等因素的影响,人体姿态估计常常受到分布歧义、尺度歧义和关联歧义等问题的困扰。因此,合理的编解码设计是解决人体姿态估计各类歧义性问题的关键。首先,对人体姿态建模方法进行介绍,其是实现人体姿态编解码的前提条件。然后,针对分布歧义问题,从基于分布约束、基于结构约束和基于迭代约束3个方面进行介绍;尺度歧义问题被划分为关键点尺度歧义和像素尺度歧义问题,并介绍与之相关的基于尺度表征、基于无偏变换和基于积分回归的方法;针对关联歧义问题,归纳包括基于图优化、基于肢体向量、基于实例中心和基于参考标签的4类人体姿态编解码方法。同时,对各方法的性能进行了总结分析。最后,对未来人体姿态编解码的研究方向进行了展望。
Within the various subfields of computer vision, human pose estimation stands out as an interesting area of research. This estimation aims to precisely localize body parts or keypoints of the human instance from a given image or video and reconstruct the skeleton structure of the human body. Human pose estimation offers technical support for various applications, such as human pose tracking, human action recognition, person re-identification, human-object interactions, and person image generation. The uses of human pose estimation span across entertainment (such as virtual reality, augmented reality, and animation), health (such as healthcare and sports), and security (such as surveillance). Consequently, high-performance and real-time human pose estimation have emerged as prominent focus areas in current computer vision research. Extensive research on human pose estimation methods has been conducted in recent years. A part of the research focuses on developing and refining high-performance or lightweight network architectures. Notable examples include Hourglass, SimpleBaseline, high resolution net(HRNet), and Lite-HRNet. These architectures have found broad utility in various visual tasks, including object detection and instance segmentation. Another facet of research is dedicated to introducing innovative pose encoding and decoding schemes. These novel schemes are intended to construct accurate and robust human pose estimation models. The encoding and decoding processes for human pose estimation represent a pivotal stage in extracting features from the input data and translating this information into comprehensible human poses. The encoding process primarily involves extracting features from the initial input data and molding them into an intermediate representation. This intermediate form, which could be feature maps or latent vectors, simplifies processing and comprehension; the subsequent decoding process retrieves the ultimate human pose from this encoded structure. Despite the considerable progress made in current research on human pose estimation, the issue of ambiguity remains a major obstacle in real-world scenarios. Diverse poses might be mapped to similar or overlapping low-dimensional representations, primarily due to variables such as illumination, motion blur, occlusions, complex poses, perspective, and resolution. This approach leads to ambiguous and uncertain resultant poses, constituting the ambiguity challenge in human pose estimation. This challenge encompasses distributive, scale, and associative ambiguity. For example, in scenarios where a hand is obscured, the precise location of the wrist becomes uncertain, thus yielding distributive ambiguity. Second, the scale of the body in the image diminishes when the camera is positioned farther from the human instance, often making it difficult to ascertain the accurate scale without ample contextual details, leading to scale ambiguity. Third, precisely assigning the identified keypoints to corresponding human instances becomes intricate when two human instances obscure each other, thereby introducing associative ambiguity. The well-designed methods for encoding and decoding human poses enable the suitable modeling and solving of human pose estimation. These methods provide effective optimization objectives and feature representations for the model, allowing for the construction of highly reasonable and robust human pose estimation models. Therefore, investigating encoding and decoding for human pose estimation carries substantial importance for research. The majority of past review papers on human pose estimation have primarily focused on the design of network structures, while the ambiguity problem can markedly influence the performance of human pose estimation. The objective is to provide a summarized analysis of the current research on pose encoding and decoding methods. This analysis will encompass a thorough investigation of the inherent ambiguity challenge associated with human pose estimation. In this paper, human pose modeling techniques are first introduced, which directly impact the potential for expressive human pose representation. Second, the pose encoding and decoding methods are categorized into distributive, scale, and associative ambiguity. Three strategies are explored to address distributive ambiguity: distributive, structural, and iterative constraints. The scale ambiguity is further refined into the keypoint- and pixel-wise scale ambiguity problem. The former is mainly addressed through representative-based methods, and the latter can be solved using unbiased and integral-based methods. Possible approaches for associative ambiguity can be categorized into the following four groups: graph-, limb-, center-, and embedding-based methods. These diverse methods provide multiple potential solutions for dealing with associative ambiguity. A summary and performance comparison of the methods used for encoding and decoding human poses are provided to help understand the strengths and limitations of each approach. Finally, potential directions for future development are elucidated. This paper aims to establish a novel research trajectory for researchers: addressing the ambiguity problem in human pose estimation through encoding and decoding. The resolution of ambiguity challenges in human pose estimation is expected to broaden its potential applications.
深度学习人体姿态估计歧义性问题人体姿态编解码人体姿态建模
deep learninghuman pose estimationambiguity problemhuman pose encoding and decodinghuman pose modeling
Andriluka M, Pishchulin L, Gehler P and Schiele B. 2014. 2D human pose estimation: new benchmark and state of the art analysis//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 3686-3693 [DOI: 10.1109/CVPR.2014.471http://dx.doi.org/10.1109/CVPR.2014.471]
Andriluka M, Roth S and Schiele B. 2009. Pictorial structures revisited: people detection and articulated pose estimation//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 1014-1021 [DOI: 10.1109/CVPR.2009.5206754http://dx.doi.org/10.1109/CVPR.2009.5206754]
Brasó G, Kister N and Leal-Taixé L. 2021. The center of attention: center-keypoint grouping via attention for multi-person pose estimation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE: 11833-11843 [DOI: 10.1109/ICCV48922.2021.01164http://dx.doi.org/10.1109/ICCV48922.2021.01164]
Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J D, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I and Amodei D. 2020. Language models are few-shot learners//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 1877-1901
Bulat A and Tzimiropoulos G. 2016. Human pose estimation via convolutional part heatmap regression//Proceedings of the 14th European Conference on Computer Vision—ECCV 2016. Amsterdam, the Netherlands: Springer: 717-732 [DOI: 10.1007/978-3-319-46478-7_44http://dx.doi.org/10.1007/978-3-319-46478-7_44]
Cao Z, Hidalgo G, Simon T, Wei S E and Sheikh Y. 2021. OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1): 172-186 [DOI: 10.1109/TPAMI.2019.2929257http://dx.doi.org/10.1109/TPAMI.2019.2929257]
Cao Z, Simon T, Wei S E and Sheikh Y. 2017. Realtime multi-person 2D pose estimation using part affinity fields//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 7291-7299 [DOI: 10.1109/CVPR.2017.143http://dx.doi.org/10.1109/CVPR.2017.143]
Carreira J, Agrawal P, Fragkiadaki K and Malik J. 2016. Human pose estimation with iterative error feedback//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 4733-4742 [DOI: 10.1109/CVPR.2016.512http://dx.doi.org/10.1109/CVPR.2016.512]
Chen Y L, Wang Z C, Peng Y X, Zhang Z Q, Yu G and Sun J. 2018. Cascaded pyramid network for multi-person pose estimation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7103-7112 [DOI: 10.1109/CVPR.2018.00742http://dx.doi.org/10.1109/CVPR.2018.00742]
Cheng B W, Xiao B, Wang J D, Shi H H, Huang T S and Zhang L. 2020. HigherHRNet: scale-aware representation learning for bottom-up human pose estimation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 5386-5395 [DOI: 10.1109/CVPR42600.2020.00543http://dx.doi.org/10.1109/CVPR42600.2020.00543]
Dai Y, Wang X H, Gao L L, Song J K and Shen H T. 2021. RSGNet: relation based skeleton graph network for crowded scenes pose estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2): 1193-1200 [DOI: 10.1609/aaai.v35i2.16206http://dx.doi.org/10.1609/aaai.v35i2.16206]
De Brabandere B, Jia X, Tuytelaars T and Van Gool L. 2016. Dynamic filter networks//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc.: 667-675 [DOI: 10.5555/3157096.3157171http://dx.doi.org/10.5555/3157096.3157171]
Du C J, Yan Z Q, Yu H, Yu L and Xiong Z X. 2022a. Hierarchical associative encoding and decoding for bottom-up human pose estimation. IEEE Transactions on Circuits and Systems for Video Technology, 33(4): 1762-1775 [DOI: 10.1109/TCSVT.2022.3215564http://dx.doi.org/10.1109/TCSVT.2022.3215564]
Du C J, Yu H and Yu L. 2022b. A scale-sensitive heatmap representation for multi-person pose estimation. IET Image Processing, 16(4): 1194-1207 [DOI: 10.1049/ipr2.12404http://dx.doi.org/10.1049/ipr2.12404]
Duan H D, Lin K Y, Jin S, Liu W T, Qian C and Ouyang W L. 2019. TRB: a novel triplet representation for understanding 2D human body//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea(South): IEEE: 9478-9487 [DOI: 10.1109/ICCV.2019.00957http://dx.doi.org/10.1109/ICCV.2019.00957]
Eichner M and Ferrari V. 2009. Better appearance models for pictorial structures//Proceedings of 2009 the British Machine Vision Conference. London, UK: BMVA Press: 1-11 [DOI: 10.5244/C.23.3http://dx.doi.org/10.5244/C.23.3]
Fan X C, Zheng K, Lin Y W and Wang S. 2015. Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 1347-1355 [DOI: 10.1109/CVPR.2015.7298740http://dx.doi.org/10.1109/CVPR.2015.7298740]
Fang H S, Li J F, Tang H Y, Xu C, Zhu H Y, Xiu Y L, Li Y L and Lu C W. 2023. AlphaPose: whole-body regional multi-person pose estimation and tracking in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6): 7157-7173 [DOI: 10.1109/TPAMI.2022.3222784http://dx.doi.org/10.1109/TPAMI.2022.3222784]
Felzenszwalb P, McAllester D and Ramanan D. 2008. A discriminatively trained, multiscale, deformable part model//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE: 1-8 [DOI: 10.1109/CVPR.2008.4587597http://dx.doi.org/10.1109/CVPR.2008.4587597]
Felzenszwalb P F and Huttenlocher D P. 2005. Pictorial structures for object recognition. International Journal of Computer Vision, 61(1): 55-79 [DOI: 10.1023/B:VISI.0000042934.15159.49http://dx.doi.org/10.1023/B:VISI.0000042934.15159.49]
Fischler M A and Elschlager R A. 1973. The representation and matching of pictorial structures. IEEE Transactions on Computers, C-22(1): 67-92 [DOI: 10.1109/T-C.1973.223602http://dx.doi.org/10.1109/T-C.1973.223602]
Geng Z G, Sun K, Xiao B, Zhang Z X and Wang J D. 2021. Bottom-up human pose estimation via disentangled keypoint regression//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 14671-14681 [DOI: 10.1109/CVPR46437.2021.01444http://dx.doi.org/10.1109/CVPR46437.2021.01444]
Geng Z G, Wang C Y, Wei Y X, Liu Z, Li H Q and Hu H. 2023. Human pose as compositional tokens//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE: 660-671 [DOI: 10.1109/CVPR52729.2023.00071http://dx.doi.org/10.1109/CVPR52729.2023.00071]
Gu K R, Yang L L and Yao A. 2021. Removing the bias of integral pose regression//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE: 11047-11056 [DOI: 10.1109/ICCV48922.2021.01088http://dx.doi.org/10.1109/ICCV48922.2021.01088]
Gu K R, Yang L L, Mi M B and Yao A. 2023. Bias-compensated integral regression for human pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9): 10687-10702 [DOI: 10.1109/TPAMI.2023.3264742http://dx.doi.org/10.1109/TPAMI.2023.3264742]
He K M, Gkioxari G, Dollr P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2980-2988 [DOI: 10.1109/ICCV.2017.322http://dx.doi.org/10.1109/ICCV.2017.322]
Huang J J, Zhu Z, Guo F and Huang G. 2020. The devil is in the details: delving into unbiased data processing for human pose estimation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 5699-5708 [DOI: 10.1109/CVPR42600.2020.00574http://dx.doi.org/10.1109/CVPR42600.2020.00574]
Insafutdinov E, Pishchulin L, Andres B, Andriluka M and Schiele B. 2016. Deepercut: a deeper, stronger, and faster multi-person pose estimation model//Proceedings of the 14th European Conference on Computer Vision—ECCV 2016. Amsterdam, the Netherlands: Springer: 34-50 [DOI: 10.1007/978-3-319-46466-4_3http://dx.doi.org/10.1007/978-3-319-46466-4_3]
Jaderberg M, Simonyan K, Zisserman A and Kavukcuoglu K. 2015. Spatial Transformer networks//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc.: 2017-2025 [DOI: 10.5555/2969442.2969465http://dx.doi.org/10.5555/2969442.2969465]
Jang E, Gu S X and Poole B. 2017. Categorical reparameterization with Gumbel-Softmax [EB/OL]. [2023-11-01]. https://arxiv.org/pdf/1611.01144.pdfhttps://arxiv.org/pdf/1611.01144.pdf
Jin L, Wang X J, Nie X C, Liu L Q, Guo Y D and Zhao J. 2022. Grouping by center: predicting centripetal offsets for the bottom-up human pose estimation. IEEE Transactions on Multimedia, 25: 3364-3374 [DOI: 10.1109/TMM.2022.3159111http://dx.doi.org/10.1109/TMM.2022.3159111]
Jin S, Liu W T, Xie E Z, Wang W H, Qian C, Ouyang W L and Luo P. 2020. Differentiable hierarchical graph grouping for multi-person pose estimation//Proceedings of the 16th European Conference on Computer Vision—ECCV 2020. Glasgow, UK: Springer: 718-734 [DOI: 10.1007/978-3-030-58571-6_42http://dx.doi.org/10.1007/978-3-030-58571-6_42]
Kamel A, Sheng B, Li P, Kim J and Feng D D. 2021. Hybrid refinement-correction heatmaps for human pose estimation. IEEE Transactions on Multimedia, 23: 1330-1342 [DOI: 10.1109/TMM.2020.2999181http://dx.doi.org/10.1109/TMM.2020.2999181]
Kan Z H, Chen S S, Li Z and He Z H. 2022. Self-constrained inference optimization on structural groups for human pose estimation//Proceedings of the 17th European Conference on Computer Vision—ECCV 2022. Tel Aviv, Israel: Springer: 729-745 [DOI: 10.1007/978-3-031-20065-6_42http://dx.doi.org/10.1007/978-3-031-20065-6_42]
Kan Z H, Chen S S, Zhang C, Tang Y S and He Z H. 2023. Self-correctable and adaptable inference for generalizable human pose estimation//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE: 5537-5546 [DOI: 10.1109/CVPR52729.2023.00536http://dx.doi.org/10.1109/CVPR52729.2023.00536]
Ke L P, Chang M C, Qi H G and Lyu S W. 2018. Multi-scale structure-aware network for human pose estimation//Proceedings of the 15th European Conference on Computer Vision—ECCV 2018. Munich, Germany: Springer: 731-746 [DOI: 10.1007/978-3-030-01216-8_44http://dx.doi.org/10.1007/978-3-030-01216-8_44]
Khirodkar R, Chari V, Agrawal A and Tyagi A. 2021. Multi-instance pose networks: rethinking top-down pose estimation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE: 3102-3111 [DOI: 10.1109/ICCV48922.2021.00311http://dx.doi.org/10.1109/ICCV48922.2021.00311]
Kreiss S, Bertoni L and Alahi A. 2019. PifPaf: composite fields for human pose estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 11969-11978 [DOI: 10.1109/CVPR.2019.01225http://dx.doi.org/10.1109/CVPR.2019.01225]
Kuhn H W. 1955. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1/2): 83-97 [DOI: 10.1002/nav.3800020109http://dx.doi.org/10.1002/nav.3800020109]
Law H and Deng J. 2018. CornerNet: detecting objects as paired keypoints//Proceedings of Computer Vision—ECCV 2018: the 15th European Conference. Munich, Germany: Springer: 765-781 [DOI: 10.1007/978-3-030-01264-9_45http://dx.doi.org/10.1007/978-3-030-01264-9_45]
Li J, Su W and Wang Z F. 2020. Simple pose: rethinking and improving a bottom-up approach for multi-person pose estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7): 11354-11361 [DOI: 10.1609/aaai.v34i07.6797http://dx.doi.org/10.1609/aaai.v34i07.6797]
Li J F, Bian S Y, Zeng A L, Wang C, Pang B, Liu W T and Lu C W. 2021a. Human pose regression with residual log-likelihood estimation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE: 11005-11014 [DOI: 10.1109/ICCV48922.2021.01084http://dx.doi.org/10.1109/ICCV48922.2021.01084]
Li J F, Chen T, Shi R Q, Lou Y J, Li Y L and Lu C W. 2021b. Localization with sampling-argmax//Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS2021. Sydney, Australia: Curran Associates Inc.: 27236-27248
Li S J, Liu Z Q and Chan A B. 2015. Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. International Journal of Computer Vision, 113(1): 19-36 [DOI: 10.1007/s11263-014-0767-8http://dx.doi.org/10.1007/s11263-014-0767-8]
Li S J, Liu Z Q and Chan A B. 2014. Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Columbus, USA: IEEE: 488-495 [DOI: 10.1109/CVPRW.2014.78http://dx.doi.org/10.1109/CVPRW.2014.78]
Li W B, Wang Z C, Yin B Y, Peng Q X, Du Y M, Xiao T Z, Yu G, Lu H T, Wei Y C and Sun J. 2019. Rethinking on multi-stage networks for human pose estimation [EB/OL]. [2023-11-01]. https://arxiv.org/pdf/1901.00148.pdfhttps://arxiv.org/pdf/1901.00148.pdf
Li Y J, Yang S, Liu P D, Zhang S K, Wang Y X, Wang Z C, Yang W K and Xia S T. 2022. SimCC: a simple coordinate classification perspective for human pose estimation//Proceedings of the 17th European Conference on Computer Vision—ECCV 2022. Tel Aviv, Israel: Springer: 89-106 [DOI: 10.1007/978-3-031-20068-7_6http://dx.doi.org/10.1007/978-3-031-20068-7_6]
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollr P and Lawrence Zitnick C. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision—ECCV 2014. Zürich, Switzerland: Springer: 740-755 [DOI: 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48]
Liu H, Liu T T, Chen Y, Zhang Z L and Li Y F. 2022. EHPE: skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Transactions on Multimedia: 1-12 [DOI: 10.1109/TMM.2022.3197364http://dx.doi.org/10.1109/TMM.2022.3197364]
Lowe D G. 1999. Object recognition from local scale-invariant features//Proceedings of the 7th IEEE International Conference on Computer Vision. Kerkyra, Greece: IEEE: 1150-1157 [DOI: 10.1109/ICCV.1999.790410http://dx.doi.org/10.1109/ICCV.1999.790410]
Luo Z X, Wang Z C, Huang Y, Wang L, Tan T N and Zhou E J. 2021. Rethinking the heatmap regression for bottom-up human pose estimation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, TN, USA: IEEE: 13259-13268 [DOI: 10.1109/CVPR46437.2021.01306http://dx.doi.org/10.1109/CVPR46437.2021.01306]
Luvizon D C, Tabia H and Picard D. 2019. Human pose regression by combining indirect part detection and contextual information. Computers and Graphics, 85: 15-22 [DOI: 10.1016/j.cag.2019.09.002http://dx.doi.org/10.1016/j.cag.2019.09.002]
Mao W A, Tian Z, Wang X L and Shen C H. 2021. FCPose: fully convolutional multi-person pose estimation with dynamic instance-aware convolutions//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 9030-9039 [DOI: 10.1109/CVPR46437.2021.00892http://dx.doi.org/10.1109/CVPR46437.2021.00892]
Mao W A, Ge Y T, Shen C H, Tian Z, Wang X L, Wang Z B and Van Den Hengel A. 2022. Poseur: direct human pose regression with transformers//Proceedings of the 17th European Conference on Computer Vision—ECCV 2022. Tel Aviv, Israel: Springer: 72-88 [DOI: 10.1007/978-3-031-20068-7_5http://dx.doi.org/10.1007/978-3-031-20068-7_5]
McNally W, Vats K, Wong A and McPhee J. 2022. Rethinking keypoint representations: modeling keypoints and poses as objects for multi-person human pose estimation//Proceedings of the 17th European Conference on Computer Vision—ECCV 2022. Tel Aviv, Israel: Springer: 37-54 [DOI: 10.1007/978-3-031-20068-7_3http://dx.doi.org/10.1007/978-3-031-20068-7_3]
Mikolajczyk K and Schmid C. 2005. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10): 1615-1630 [DOI: 10.1109/TPAMI.2005.188http://dx.doi.org/10.1109/TPAMI.2005.188]
Moon G, Chang J Y and Lee K M. 2019. PoseFix: model-agnostic general human pose refinement network//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 7773-7781 [DOI: 10.1109/CVPR.2019.00796http://dx.doi.org/10.1109/CVPR.2019.00796]
Neumann L and Vedaldi A. 2018. Tiny people pose//Proceedings of the 14th Asian Conference on Computer Vision on Computer Vision-ACCV 2018. Perth, Australia: Springer: 558-574 [DOI: 10.1007/978-3-030-20893-6_35http://dx.doi.org/10.1007/978-3-030-20893-6_35]
Newell A, Huang Z A and Deng J. 2017. Associative embedding: end-to-end learning for joint detection and grouping//Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017. LongBeach, USA: Curran Associates Inc.: 2277-2287
Newell A, Yang K Y and Deng J. 2016. Stacked hourglass networks for human pose estimation//Proceedings of the 14th European Conference on Computer Vision—ECCV 2016. Amsterdam, the Netherlands: Springer: 483-499 [DOI: 10.1007/978-3-319-46484-8_29http://dx.doi.org/10.1007/978-3-319-46484-8_29]
Nibali A, He Z, Morgan S and Prendergast L. 2018. Numerical coordinate regression with convolutional neural networks [EB/OL]. [2023-09-04]. https://arxiv.org/pdf/1801.07372.pdfhttps://arxiv.org/pdf/1801.07372.pdf
Nie X C, Feng J S, Xing J L, Xiao S T and Yan S C. 2018a. Hierarchical contextual refinement networks for human pose estimation. IEEE Transactions on Image Processing, 28(2): 924-936 [DOI: 10.1109/TIP.2018.2872628http://dx.doi.org/10.1109/TIP.2018.2872628]
Nie X C, Feng J S, Xing J L and Yan S C. 2018b. Pose partition networks for multi-person pose estimation//Proceedings of the 15th European Conference on Computer Vision—ECCV 2018. Munich, Germany: Springer: 684-699 [DOI: 10.1007/978-3-030-01228-1_42http://dx.doi.org/10.1007/978-3-030-01228-1_42]
Nie X C, Feng J S, Zhang J F and Yan S C. 2019. Single-stage multi-person pose machines//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea(South): IEEE: 6950-6959 [DOI: 10.1109/ICCV.2019.00705http://dx.doi.org/10.1109/ICCV.2019.00705]
Ning G H, Zhang Z and He Z Q. 2018. Knowledge-guided deep fractal neural networks for human pose estimation. IEEE Transactions on Multimedia, 20(5): 1246-1259 [DOI: 10.1109/TMM.2017.2762010http://dx.doi.org/10.1109/TMM.2017.2762010]
Papandreou G, Zhu T, Chen L C, Gidaris S, Tompson J and Murphy K. 2018. PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model//Proceedings of the 15th European Conference on Computer Vision—ECCV 2018. Munich, Germany: Springer: 282-299 [DOI: 10.1007/978-3-030-01264-9_17http://dx.doi.org/10.1007/978-3-030-01264-9_17]
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C and Murphy K. 2017. Towards accurate multi-person pose estimation in the wild//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 3711-3719 [DOI: 10.1109/CVPR.2017.395http://dx.doi.org/10.1109/CVPR.2017.395]
Pishchulin L, Insafutdinov E, Tang S Y, Andres B, Andriluka M, Gehler P and Schiele B. 2016. DeepCut: joint subset partition and labeling for multi person pose estimation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 4929-4937 [DOI: 10.1109/CVPR.2016.533http://dx.doi.org/10.1109/CVPR.2016.533]
Qiu L T, Zhang X Y, Li Y R, Li G B, Wu X J, Xiong Z X, Han X G and Cui S G. 2020a. Peeking into occluded joints: a novel framework for crowd pose estimation//Proceedings of the 16th European Conference on Computer Vision—ECCV 2020. Glasgow, UK: Springer: 488-504 [DOI: 10.1007/978-3-030-58529-7_29http://dx.doi.org/10.1007/978-3-030-58529-7_29]
Qiu Z W, Qiu K, Fu J L and Fu D M. 2020b. DGCN: dynamic graph convolutional network for efficient multi-person pose estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7): 11924-11931 [DOI: 10.1609/aaai.v34i07.6867http://dx.doi.org/10.1609/aaai.v34i07.6867]
Qu H X, Cai Y J, Foo L G, Kumar A and Liu J. 2023. A characteristic function-based method for bottom-up human pose estimation//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE: 13009-13018 [DOI: 10.1109/CVPR52729.2023.01250http://dx.doi.org/10.1109/CVPR52729.2023.01250]
Qu H X, Xu L, Cai Y J, Foo L G and Liu J. 2022. Heatmap distribution matching for human pose estimation//Advances in Neural Information Processing Systems 35 (NeurIPS 2022. OrleansNew, USA: Curran Associates Inc.: 24327-24339
Ronchi M R and Perona P. 2017. Benchmarking and error diagnosis in multi-instance pose estimation//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 369-378 [DOI: 10.1109/ICCV.2017.48http://dx.doi.org/10.1109/ICCV.2017.48]
Sapp B and Taskar B. 2013. MODEC: multimodal decomposable models for human pose estimation//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 3674-3681 [DOI: 10.1109/CVPR.2013.471http://dx.doi.org/10.1109/CVPR.2013.471]
Sapp B, Weiss D and Taskar B. 2011. Parsing human motion with stretchable models//Proceedings of 2011 Conference of Computer Vision and Pattern Recognition (CVPR 2011). Colorado Springs, USA: IEEE: 1281-1288 [DOI: 10.1109/CVPR.2011.5995607http://dx.doi.org/10.1109/CVPR.2011.5995607]
Shi D H, Wei X, Li L Q, Ren Y and Tan W M. 2022. End-to-end multi-person pose estimation with Transformers//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE: 11069-11078 [DOI: 10.1109/CVPR52688.2022.01079http://dx.doi.org/10.1109/CVPR52688.2022.01079]
Shi D H, Wei X, Yu X D, Tan W M, Ren Y and Pu S L. 2021. InsPose: instance-aware networks for single-stage multi-person pose estimation//Proceedings of the 29th ACM International Conference on Multimedia. New York, USA: Association for Computing Machinery: 3079-3087 [DOI: 10.1145/3474085.3475447http://dx.doi.org/10.1145/3474085.3475447]
Sun K, Xiao B, Liu D and Wang J D. 2019. Deep high-resolution representation learning for human pose estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 5686-5696 [DOI: 10.1109/CVPR.2019.00584http://dx.doi.org/10.1109/CVPR.2019.00584]
Sun X, Shang J X, Liang S and Wei Y C. 2017. Compositional human pose regression//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2621-2630 [DOI: 10.1109/ICCV.2017.284http://dx.doi.org/10.1109/ICCV.2017.284]
Sun X, Xiao B, Wei F Y, Liang S and Wei Y C. 2018. Integral human pose regression//Proceedings of the 15th European Conference on Computer Vision—ECCV 2018. Munich, Germany: Springer: 539-553 [DOI: 10.1007/978-3-030-01231-1_33http://dx.doi.org/10.1007/978-3-030-01231-1_33]
Tang W and Wu Y. 2019. Does learning specific features for related parts help human pose estimation?//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 1107-1116 [DOI: 10.1109/CVPR.2019.00120http://dx.doi.org/10.1109/CVPR.2019.00120]
Tang W, Yu P and Wu Y. 2018. Deeply learned compositional models for human pose estimation//Proceedings of the 15th European Conference on Computer Vision—ECCV 2018. Munich, Germany: Springer: 197-214 [DOI: 10.1007/978-3-030-01219-9_12http://dx.doi.org/10.1007/978-3-030-01219-9_12]
Tian Y D, Lawrence Zitnick C and Narasimhan S G. 2012. Exploring the spatial hierarchy of mixture models for human pose estimation//Proceedings of the 12th European Conference on Computer Vision on Computer Vision—ECCV 2012. Florence, Italy: Springer: 256-269 [DOI: 10.1007/978-3-642-33715-4_19http://dx.doi.org/10.1007/978-3-642-33715-4_19]
Tian Z, Shen C H, Chen H and He T. 2019. FCOS: fully convolutional one-stage object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE: 9626-9635 [DOI: 10.1109/ICCV.2019.00972http://dx.doi.org/10.1109/ICCV.2019.00972]
Tompson J, Goroshin R, Jain A, LeCun Y and Bregler C. 2015. Efficient object localization using convolutional networks//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 648-656 [DOI: 10.1109/CVPR.2015.7298664http://dx.doi.org/10.1109/CVPR.2015.7298664]
Tompson J, Jain A, LeCun Y and Bregler C. 2014. Joint training of a convolutional network and a graphical model for human pose estimation//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montréal, Canada: MIT Press: 1799-1807 [DOI: 10.5555/2968826.2969027http://dx.doi.org/10.5555/2968826.2969027]
Toshev A and Szegedy C. 2014. DeepPose: human pose estimation via deep neural networks//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 1653-1660 [DOI: 10.1109/CVPR.2014.214http://dx.doi.org/10.1109/CVPR.2014.214]
Van Den Oord A, Vinyals O and Kavukcuoglu K. 2017. Neural discrete representation learning//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 6309-6318 [DOI: 10.5555/3295222.3295378http://dx.doi.org/10.5555/3295222.3295378]
Varamesh A and Tuytelaars T. 2020. Mixture dense regression for object detection and human pose estimation// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 13083-13092 [DOI: 10.1109/CVPR42600.2020.01310http://dx.doi.org/10.1109/CVPR42600.2020.01310]
Wang C, Zhang F, Zhu X T and Ge S S. 2022. Low-resolution human pose estimation. Pattern Recognition, 126: #108579 [DOI: 10.1016/j.patcog.2022.108579http://dx.doi.org/10.1016/j.patcog.2022.108579]
Wang D K and Zhang S L. 2022. Contextual instance decoupling for robust multi-person pose estimation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE: 11050-11058 [DOI: 10.1109/CVPR52688.2022.01078http://dx.doi.org/10.1109/CVPR52688.2022.01078]
Wang D K, Zhang S L and Hua G. 2021a. Robust pose estimation in crowded scenes with direct pose-level inference//Advances in Neural Information Processing Systems 34 (NeurIPS 2021). USA: Curran Associates Inc.: 6278-6289
Wang J, Long X, Gao Y, Ding E R and Wen S L. 2020. Graph-PCNN: two stage human pose estimation with graph pose refinement//Proceedings of the 16th European Conference on Computer Vision—ECCV 2020. Glasgow, UK: Springer: 492-508 [DOI: 10.1007/978-3-030-58621-8_29http://dx.doi.org/10.1007/978-3-030-58621-8_29]
Wang J D, Sun K, Cheng T H, Jiang B R, Deng C R, Zhao Y, Liu D, Mu Y D, Tan M K, Wang X G, Liu W Y and Xiao B. 2021b. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10): 3349-3364 [DOI: 10.1109/TPAMI.2020.2983686http://dx.doi.org/10.1109/TPAMI.2020.2983686]
Wang S F, Ihler A, Kording K and Yarkony J. 2018. Accelerating dynamic programs via nested benders decomposition with application to multi-person pose estimation//Proceedings of the 15th European Conference on Computer Vision—ECCV 2018. Munich, Germany: Springer: 677-692 [DOI: 10.1007/978-3-030-01264-9_40http://dx.doi.org/10.1007/978-3-030-01264-9_40]
Wei F Y, Sun X, Li H Y, Wang J D and Lin S. 2020. Point-set anchors for object detection, instance segmentation and pose estimation//Proceedings of the 16th European Conference on Computer Vision—ECCV 2020. Glasgow, UK: Springer: 527-544 [DOI: 10.1007/978-3-030-58607-2_31http://dx.doi.org/10.1007/978-3-030-58607-2_31]
Xiang L H, Li J and Wang Z F. 2022. Least-squares estimation of keypoint coordinate for human pose estimation// Proceedings of the 5th Chinese Conference on Pattern Recognition and Computer Vision. Shenzhen, China: Springer: 448-460 [DOI: 10.1007/978-3-031-18913-5_35http://dx.doi.org/10.1007/978-3-031-18913-5_35]
Xiao B, Wu H P and Wei Y C. 2018. Simple baselines for human pose estimation and tracking//Proceedings of the 15th European Conference on Computer Vision—ECCV 2018. Munich, Germany: Springer: 472-487 [DOI: 10.1007/978-3-030-01231-1_29http://dx.doi.org/10.1007/978-3-030-01231-1_29]
Xiao Y B, Su K H, Wang X J, Yu D D, Jin L, He M S and Yuan Z H. 2022a. QueryPose: sparse multi-person pose regression via spatial-aware part-level query//Advances in Neural Information Processing Systems 35 (NeurIPS 2022. OrleansNew, USA: Curran Associates Inc.: 12464-12477
Xiao Y B, Wang X J, Yu D D, Wang G L, Zhang Q and He M S. 2022b. Adaptivepose: human parts as adaptive points. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3): 2813-2821 [DOI: 10.1609/aaai.v36i3.20185http://dx.doi.org/10.1609/aaai.v36i3.20185]
Xiao Y B, Yu D D, Wang X J, Jin L, Wang G L and Zhang Q. 2022c. Learning quality-aware representation for multi-person pose regression. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3): 2822-2830 [DOI: 10.1609/aaai.v36i3.20186http://dx.doi.org/10.1609/aaai.v36i3.20186]
Xu X X, Zou Q and Lin X. 2022. Adaptive hypergraph neural network for multi-person pose estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3): 2955-2963 [DOI: 10.1609/aaai.v36i3.20201http://dx.doi.org/10.1609/aaai.v36i3.20201]
Xue N, Wu T F, Xia G S and Zhang L P. 2022. Learning local-global contextual adaptation for multi-person pose estimation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE: 13055-13064 [DOI: 10.1109/CVPR52688.2022.01272http://dx.doi.org/10.1109/CVPR52688.2022.01272]
Yang J, Zeng A L, Liu S L, Li F, Zhang R M and Zhang L. 2023a. Explicit box detection unifies end-to-end multi-person pose estimation [EB/OL]. [2023-11-01]. https://arxiv.org/pdf/2302.01593.pdfhttps://arxiv.org/pdf/2302.01593.pdf
Yang S, Feng Z, Wang Z C, Li Y J, Zhang S K, Quan Z B, Xia S T and Yang W K. 2023b. Detecting and grouping keypoints for multi-person pose estimation using instance-aware attention. Pattern Recognition, 136: #109232 [DOI: 10.1016/j.patcog.2022.109232http://dx.doi.org/10.1016/j.patcog.2022.109232]
Yang Y and Ramanan D. 2011. Articulated pose estimation with flexible mixtures-of-parts//Proceedings of 2011 Conference of Computer vision and Pattern Recognition (CVPR 2011). Colorado Springs, USA: IEEE: 1385-1392 [DOI: 10.1109/CVPR.2011.5995741http://dx.doi.org/10.1109/CVPR.2011.5995741]
Yang Y and Ramanan D. 2012. Articulated human detection with flexible mixtures of parts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12): 2878-2890 [DOI: 10.1109/TPAMI.2012.261http://dx.doi.org/10.1109/TPAMI.2012.261]
Ye S H, Zhang Y Y, Hu J, Cao L J, Zhang S C, Shen L, Wang J, Ding S H and Ji R R. 2023. DistilPose: tokenized pose regression with heatmap distillation//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE: 2163-2172 [DOI: 10.1109/CVPR52729.2023.00215http://dx.doi.org/10.1109/CVPR52729.2023.00215]
Yu C Q, Xiao B, Gao C X, Yuan L, Zhang L, Sang N and Wang J D. 2021. Lite-HRnet: a lightweight high-resolution network//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 10435-10445 [DOI: 10.1109/CVPR46437.2021.01030http://dx.doi.org/10.1109/CVPR46437.2021.01030]
Yu F, Koltun V. 2016. Multi-scale context aggregation by dilated convolutions [EB/OL]. [2023-11-01]. https://arxiv.org/pdf/1511.07122.pdfhttps://arxiv.org/pdf/1511.07122.pdf.
Yu H, Du C J and Yu L. 2022. Scale-aware heatmap representation for human pose estimation. Pattern Recognition Letters, 154: 1-6 [DOI: 10.1016/j.patrec.2021.12.018http://dx.doi.org/10.1016/j.patrec.2021.12.018]
Zatsiorsky V M. Kinetics of Human Motion. Human Kinetics, 2002. Champaign County, USA.
Zhang F, Zhu X T, Dai H B, Ye M and Zhu C. 2020. Distribution-aware coordinate representation for human pose estimation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 7091-7100 [DOI: 10.1109/CVPR42600.2020.00712http://dx.doi.org/10.1109/CVPR42600.2020.00712]
Zhang J, Chen Z and Tao D C. 2021a. Towards high performance human keypoint detection. International Journal of Computer Vision, 129(9): 2639-2662 [DOI: 10.1007/s11263-021-01482-8http://dx.doi.org/10.1007/s11263-021-01482-8]
Zhang J B, Zhu Z, Lu J W, Huang J J, Huang G and Zhou J. 2021b. SIMPLE: single-network with mimicking and point learning for bottom-up human pose estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4): 3342-3350 [DOI: 10.1609/aaai.v35i4.16446http://dx.doi.org/10.1609/aaai.v35i4.16446]
Zhao L, Xu J, Gong C, Yang J, Zuo W M and Gao X B. 2020. Learning to acquire the quality of human pose estimation. IEEE Transactions on Circuits and Systems for Video Technology, 31(4): 1555-1568 [DOI: 10.1109/TCSVT.2020.3005522http://dx.doi.org/10.1109/TCSVT.2020.3005522]
Zhou L, Chen Y Y, Wang J Q and Lu H Q. 2020, Progressive Bi-C3D pose grammar for human pose estimation//Proceedings of the AAAI Conference on Artificial Intelligence. New York, USA:13033-13040 [DOI: https://doi.org/10.1609/aaai.v34i07.7004https://doi.org/10.1609/aaai.v34i07.7004]
Zhou X Y, Wang D Q and Krähenbühl P. 2019. Objects as points [EB/OL]. [2023-09-04]. https://arxiv.org/pdf/1904.07850.pdfhttps://arxiv.org/pdf/1904.07850.pdf
Zuffi S, Freifeld O and Black M J. 2012. From pictorial structures to deformable structures//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE: 3546-3553 [DOI: 10.1109/CVPR.2012.6248098http://dx.doi.org/10.1109/CVPR.2012.6248098]
相关作者
相关机构