A review of skeleton-based human action recognition

Lu Jian; Li Xuanfeng; Zhao Bo; Zhou Jian

doi:10.11834/jig.230046

Review | Views : 0 下载量: 4 CSCD: 0

PDF
Export
Share
Collection
Album

A review of skeleton-based human action recognition
Vol. 28, Issue 12, Pages: 3651-3669(2023)
Published： 16 December 2023 ，
DOI： 10.11834/jig.230046
稿件说明：

移动端阅览

卢健，李萱峰，赵博，周健. 2023. 骨骼信息的人体行为识别综述. 中国图象图形学报， 28(12):3651-3669

Lu Jian， Li Xuanfeng， Zhao Bo， Zhou Jian. 2023. A review of skeleton-based human action recognition. Journal of Image and Graphics， 28(12):3651-3669
卢健，李萱峰，赵博，周健. 2023. 骨骼信息的人体行为识别综述. 中国图象图形学报， 28(12):3651-3669 DOI： 10.11834/jig.230046.

Lu Jian， Li Xuanfeng， Zhao Bo， Zhou Jian. 2023. A review of skeleton-based human action recognition. Journal of Image and Graphics， 28(12):3651-3669 DOI： 10.11834/jig.230046.

摘要

基于骨骼信息的人体行为识别旨在从输入的包含一个或多个行为的骨骼序列中，正确地分析出行为的种类，是计算机视觉领域的研究热点之一。与基于图像的人体行为识别方法相比，基于骨骼信息的人体行为识别方法不受背景、人体外观等干扰因素的影响，具有更高的准确性、鲁棒性和计算效率。针对基于骨骼信息的人体行为识别方法的重要性和前沿性，对其进行全面和系统的总结分析具有十分重要的意义。本文首先回顾了9个广泛应用的骨骼行为识别数据集，按照数据收集视角的差异将它们分为单视角数据集和多视角数据集，并着重探讨了不同数据集的特点和用法。其次，根据算法所使用的基础网络，将基于骨骼信息的行为识别方法分为基于手工制作特征的方法、基于循环神经网络的方法、基于卷积神经网络的方法、基于图卷积网络的方法以及基于Transformer的方法，重点阐述分析了这些方法的原理及优缺点。其中，图卷积方法因其强大的空间关系捕捉能力而成为目前应用最为广泛的方法。采用了全新的归纳方法，对图卷积方法进行了全面综述，旨在为研究人员提供更多的思路和方法。最后，从8个方面总结现有方法存在的问题，并针对性地提出工作展望。

Abstract

Skeleton-based human action recognition aims to correctly analyze the classes of actions from skeleton sequences， which contain one or more actions. Skeleton-based human action recognition has recently emerged as a hot research topic in the field of computer vision. Due to the fact that actions can be used to handle tasks and express human emotions， action recognition can be widely applied in various fields， such as intelligent monitoring systems， human-computer interaction， virtual reality， and smart healthcare. Compared with RGB-based human action recognition， skeleton-based human action recognition methods are less affected by interference factors， such as background and human appearance， and have higher accuracy and robustness. In addition， these methods require a small amount of data and show a high computational efficiency， thereby increasing their prospects in practical applications. In this case， comprehensively and systematically summarizing and analyzing skeleton-based human action recognition methods become critical. Compared with other reviews on skeleton-based action recognition， our contributions are as follows： we provide a more comprehensive summary of skeleton-based action datasets； we provide a more comprehensive summary of skeleton-based action recognition methods， including the latest Transformer technology； we offer a more instructive classification of graph convolutional methods； and we not only summarize the existing problems but also forecast the prospects for future research. First， we introduce nine datasets that are commonly used for skeleton-based action recognition， including the MSR Action3D， MSR Daily Activity 3D， 3D Action Pairs， SYSU 3DHOI， UTD-MHAD， Northwestern-UCLA， NTU RGB+D 60， Skeleton-Kinetics， and NTU RGB +D 120 datasets. In order to highlight the characteristics of these datasets prominently， we divide them into single-view and multi-view datasets from the data collection perspective and then explore the traits and uses of each category. Second， based on the backbone network used by the models， we categorize the skeleton-based action recognition methods into those based on handcrafted features， based on recurrent neural network （RNN）， based on convolutional neural network （CNN）， based on graph convolutional network （GCN）， and based on Transformer. Before the rise of deep learning methods， traditional algorithms （handcrafted features） were often used to model human skeleton data. The key problem in using such methods is how to create an effective feature representation of human skeleton sequences. However， after the rise of deep learning methods， which demonstrate excellent performance in various fields， such as face recognition， image classification， and image super-resolution， researchers have begun using deep learning networks to model skeleton data. Among them， RNN effectively processes data in the form of continuous time series and is adept at learning temporal dependencies information in sequence data， while CNN can effectively learn high-level semantic information of skeleton data. Training a CNN-based model requires lower computational costs than RNN. Unlike RNN-based methods， before using CNN， the skeleton data should be reshaped into pseudo-images. The columns of the pseudo-image represent the features of all joints in one frame， while the rows represent the features of a certain joint across all frames. However， when RNN or CNN methods are used to model skeleton data， the topological structure of the human skeleton is ignored. Transforming the skeleton data into sequence vectors of joint coordinates or a 2D grid cannot accurately describe the dynamic skeleton of the human body. Previous studies show that graph convolution has a powerful ability to model topological graph structures， making this method particularly suitable for modeling the human skeleton. Given their successful application， graph convolutional methods have been widely used in skeleton-based action recognition. This paper specifically adopts a novel inductive approach and provides a comprehensive review of GCN-based methods. These GCN-based methods are further classified according to the problems targeted in the literature with an aim to provide researchers with additional ideas and methods. These studies can be divided into optimization of the graph structure， network lightweighting， optimization of temporal and spatial features， and optimization of missing and noisy joints. This paper also provides a comprehensive summary of the issues faced by the currently available methods. This paper not only points out the limitations and challenges faced by these methods but also evaluates the future development trend and provides insightful prospects for the field. By doing so， this review not only helps readers gain a deep understanding of the current state of this task but also provides valuable guidance for future research in this area.

关键词

行为识别骨骼信息数据集深度学习图卷积网络（GCN）

Keywords

action recognitionskeleton informationdatasetsdeep learninggraph convolutional network （GCN）

references

Bai R W， Li M， Meng B， Li F F， Jiang M， Ren J X and Sun D G. 2022. Hierarchical graph convolutional skeleton transformer for action recognition//Proceedings of 2022 IEEE International Conference on Multimedia and Expo. Taipei， China： IEEE： #9859781 ［DOI： 10.1109/ICME52920.2022.9859781http://dx.doi.org/10.1109/ICME52920.2022.9859781］

Caetano C， Brémond F and Schwartz W R. 2019. Skeleton image representation for 3D action recognition based on tree structure and reference joints//Proceedings of the 32nd SIBGRAPI Conference on Graphics， Patterns and Images. Rio de Janeiro， Brazil： IEEE： 16-23 ［DOI： 10.1109/SIBGRAPI.2019.00011http://dx.doi.org/10.1109/SIBGRAPI.2019.00011］

Cao Z， Simon T， Wei S E and Sheikh Y. 2017. Realtime multi-person 2D pose estimation using part affinity fields//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 1302-1310 ［DOI： 10.1109/CVPR.2017.143http://dx.doi.org/10.1109/CVPR.2017.143］

Chen C， Jafari R and Kehtarnavaz N. 2015. UTD-MHAD： a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor//Proceedings of 2015 IEEE International Conference on Image Processing. Quebec， Canada： IEEE： 168-172 ［DOI： 10.1109/ICIP.2015.7350781http://dx.doi.org/10.1109/ICIP.2015.7350781］

Chen H S， Chen H T， Chen Y W and Lee S Y. 2006. Human action recognition using star skeleton//Proceedings of the 4th ACM International Workshop on Video Surveillance and Sensor Networks. Santa Barbara， USA： ACM： 171-178 ［DOI： 10.1145/1178782.1178808http://dx.doi.org/10.1145/1178782.1178808］

Chen T L， Zhou D S， Wang J， Wang S D， Guan Y， He X M and Ding E R. 2021. Learning multi-granular spatio-temporal graph network for skeleton-based action recognition//Proceedings of the 29th ACM International Conference on Multimedia. Virtual Event， China： ACM： 4334-4342 ［DOI： 10.1145/3474085.3475574http://dx.doi.org/10.1145/3474085.3475574］

Cheng K， Zhang Y F， He X Y， Chen W H， Cheng J and Lu H Q. 2020. Skeleton-based action recognition with shift graph convolutional network//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 180-189 ［DOI： 10.1109/CVPR42600.2020.00026http://dx.doi.org/10.1109/CVPR42600.2020.00026］

Das Dawn D and Shaikh S H. 2016. A comprehensive survey of human action recognition with spatio-temporal interest point （STIP） detector. The Visual Computer， 32（3）： 289-306 ［DOI： 289-306.10.1007/s00371-015-1066-2http://dx.doi.org/289-306.10.1007/s00371-015-1066-2］

Ding X L， Yang K and Chen W. 2019. An attention-enhanced recurrent graph convolutional network for skeleton-based action recognition//Proceedings of the 2nd International Conference on Signal Processing and Machine Learning. Hangzhou， China： ACM： 79-84 ［DOI： 10.1145/3372806.3372814http://dx.doi.org/10.1145/3372806.3372814］

Du Y， Fu Y and Wang L. 2015a. Skeleton based action recognition with convolutional neural network//Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition. Kuala Lumpur， Malaysia： IEEE： 579-583 ［DOI： 10.1109/ACPR.2015.7486569http://dx.doi.org/10.1109/ACPR.2015.7486569］

Du Y， Wang W and Wang L. 2015b. Hierarchical recurrent neural network for skeleton based action recognition//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston， USA： IEEE： 1110-1118 ［DOI： 10.1109/CVPR.2015.7298714http://dx.doi.org/10.1109/CVPR.2015.7298714］

Duan H D， Zhao Y， Chen K， Lin D H and Dai B. 2022. Revisiting skeleton-based action recognition//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 2959-2968 ［DOI： 10.1109/CVPR52688.2022.00298http://dx.doi.org/10.1109/CVPR52688.2022.00298］

Felzenszwalb P F and Huttenlocher D P. 2005. Pictorial structures for object recognition. International Journal of Computer Vision， 61（1）： 55-79 ［DOI： 10.1023/B：VISI.0000042934.15159.49http://dx.doi.org/10.1023/B：VISI.0000042934.15159.49］

Fujiyoshi H， Lipton A J and Kanade T. 2004. Real-time human motion analysis by image skeletonization. IEICE Transactions on Information and Systems， E87-D（1）： 113-120

Gao X， Hu W， Tang J X， Liu J Y and Guo Z M. 2019. Optimized skeleton-based action recognition via sparsified graph regression//Proceedings of the 27th ACM International Conference on Multimedia. Nice， France： ACM： 601-610 ［DOI： 10.1145/3343031.3351170http://dx.doi.org/10.1145/3343031.3351170］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Hou Y H， Li Z Y， Wang P C and Li W Q. 2018. Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Transactions on Circuits and Systems for Video Technology， 28（3）： 807-811 ［DOI： 10.1109/TCSVT.2016.2628339http://dx.doi.org/10.1109/TCSVT.2016.2628339］

Hu J F， Zheng W S， Lai J H and Zhang J G. 2017. Jointly learning heterogeneous features for RGB-D activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（11）： 2186-2200 ［DOI： 10.1109/TPAMI.2016.2640292http://dx.doi.org/10.1109/TPAMI.2016.2640292］

Huang Y， Giledereli B， Köksal A， Özgür A and Ozkirimli E. 2021. Balancing methods for multi-label text classification with long-tailed class distribution ［EB/OL］. ［2022-12-22］. https://arxiv.org/pdf/2109.04712.pdfhttps://arxiv.org/pdf/2109.04712.pdf

Ji X F， Qin L L and Wang Y Y. 2019. Human interaction recognition based on RGB and skeleton data fusion model. Journal of Computer Applications， 39（11）： 3349-3354

姬晓飞，秦琳琳，王扬扬. 2019. 基于RGB和关节点数据融合模型的双人交互行为识别. 计算机应用， 39（11）： 3349-3354 ［DOI： 10.11772/j.issn.1001-9081.2019040633http://dx.doi.org/10.11772/j.issn.1001-9081.2019040633］

Kay W， Carreira J， Simonyan K， Zhang B， Hillier C， Vijayanarasimhan S， Viola F， Green T， Back T， Natsev P and Zisserman A. 2017. The kinetics human action video dataset ［EB/OL］. ［2022-12-22］. https://arxiv.org/pdf/1705.06950.pdfhttps://arxiv.org/pdf/1705.06950.pdf

Ke Q H， Bennamoun M， An S J， Sohel F and Boussaid F. 2017. A new representation of skeleton sequences for 3D action recognition//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 4570-4579 ［DOI： 10.1109/CVPR.2017.486http://dx.doi.org/10.1109/CVPR.2017.486］

Kipf T N and Welling M. 2017. Semi-supervised classification with graph convolutional networks ［EB/OL］. ［2022-12-22］. https://arxiv.org/pdf/1609.02907.pdfhttps://arxiv.org/pdf/1609.02907.pdf

Krizhevsky A， Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM， 60（6）： 84-90 ［DOI： 10.1145/3065386http://dx.doi.org/10.1145/3065386］

Lan Z Z， Chen M D， Goodman S， Gimpel K， Sharma P and Soricut R. 2020. Albert： a lite BERT for self-supervised learning of language representations ［EB/OL］. ［2022-12-22］. https://arxiv.org/pdf/1909.11942.pdfhttps://arxiv.org/pdf/1909.11942.pdf

Lee I， Kim D， Kang S and Lee S. 2017. Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 1012-1020 ［DOI： 10.1109/ICCV.2017.115http://dx.doi.org/10.1109/ICCV.2017.115］

Li C， Zhong Q Y， Xie D and Pu S L. 2018a. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm， Sweden： AAAI Press： 786-792 ［DOI： 10.24963/ijcai.2018/109http://dx.doi.org/10.24963/ijcai.2018/109］

Li C K， Wang P C， Wang S， Hou Y H and Li W Q. 2017. Skeleton-based action recognition using LSTM and CNN//Proceedings of 2017 IEEE International Conference on Multimedia and Expo Workshops （ICMEW）. Hong Kong， China： IEEE： 585-590 ［DOI： 10.1109/ICMEW.2017.8026287http://dx.doi.org/10.1109/ICMEW.2017.8026287］

Li M S， Chen S H， Chen X， Zhang Y， Wang Y F and Tian Q. 2019. Actional-structural graph convolutional networks for skeleton-based action recognition//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 3590-3598 ［DOI： 10.1109/CVPR.2019.00371http://dx.doi.org/10.1109/CVPR.2019.00371］

Li Q M， Han Z C and Wu X M. 2018b. Deeper insights into graph convolutional networks for semi-supervised learning. Proceedings of the AAAI Conference on Artificial Intelligence， 32（1）： 3538-3545 ［DOI： 10.1609/aaai.v32i1.11604http://dx.doi.org/10.1609/aaai.v32i1.11604］

Li S J， Yi J H， Farha Y A and Gall J. 2021. Pose refinement graph convolutional network for skeleton-based action recognition. IEEE Robotics and Automation Letters， 6（2）： 1028-1035 ［DOI： 10.1109/LRA.2021.3056361http://dx.doi.org/10.1109/LRA.2021.3056361］

Li W Q， Zhang Z Y and Liu Z C. 2010. Action recognition based on a bag of 3D points//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. San Francisco， USA： IEEE： 9-14 ［DOI： 10.1109/CVPRW.2010.5543273http://dx.doi.org/10.1109/CVPRW.2010.5543273］

Liang D H， Fan G L， Lin G F， Chen W J， Pan X R and Zhu H. 2019. Three-stream convolutional neural network with multi-task and ensemble learning for 3D action recognition//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Long Beach， USA： IEEE： 934-940 ［DOI： 10.1109/CVPRW.2019.00123http://dx.doi.org/10.1109/CVPRW.2019.00123］

Liu J， Shahroudy A， Perez M， Wang G， Duan L Y and Kot A C. 2020a. NTU RGB+D 120： a large-scale benchmark for 3D human activity understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence， 42（10）： 2684-2701 ［DOI： 10.1109/TPAMI.2019.2916873http://dx.doi.org/10.1109/TPAMI.2019.2916873］

Liu J， Wang G， Hu P， Duan L Y and Kot A C. 2017. Global context-aware attention LSTM networks for 3D action recognition//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 3671-3680 ［DOI： 10.1109/CVPR.2017.391http://dx.doi.org/10.1109/CVPR.2017.391］

Liu Z Y， Zhang H W， Chen Z H， Wang Z Y and Ouyang W L. 2020b. Disentangling and unifying graph convolutions for skeleton-based action recognition//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 140-149 ［DOI： 10.1109/CVPR42600.2020.00022http://dx.doi.org/10.1109/CVPR42600.2020.00022］

Lo Presti L and La Cascia M. 2016. 3D skeleton-based human action classification： a survey. Pattern Recognition， 53： 130-147 ［DOI： 10.1016/j.patcog.2015.11.019http://dx.doi.org/10.1016/j.patcog.2015.11.019］

Memmesheimer R， Theisen N and Paulus D. 2020. SL-DML： signal level deep metric learning for multimodal one-shot action recognition ［EB/OL］. ［2022-12-22］. https://arxiv.org/pdf/2004.11085.pdfhttps://arxiv.org/pdf/2004.11085.pdf

Oreifej O and Liu Z C. 2013. HON4D： histogram of oriented 4D normals for activity recognition from depth sequences//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland， USA： IEEE： 716-723 ［DOI： 10.1109/CVPR.2013.98http://dx.doi.org/10.1109/CVPR.2013.98］

Pang Y S， Ke Q H， Rahmani H， Bailey J and Liu J. 2022. IGFormer： interaction graph transformer for skeleton-based human interaction recognition//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv， Israel： Springer： 605-622 ［DOI： 10.1007/978-3-031-19806-9_35http://dx.doi.org/10.1007/978-3-031-19806-9_35］

Paoletti G， Cavazza J， Beyan C and Del Bue A. 2022. Unsupervised human action recognition with skeletal graph Laplacian and self-supervised viewpoints invariance ［EB/OL］. ［2022-12-22］. https://arxiv.org/pdf/2204.10312.pdfhttps://arxiv.org/pdf/2204.10312.pdf

Peng W， Hong X P， Chen H Y and Zhao G Y. 2020. Learning graph convolutional network for skeleton-based human action recognition by neural searching. Proceedings of the AAAI Conference on Artificial Intelligence， 34（3）： 2669-2676 ［DOI： 10.1609/AAAI.V34I03.5652http://dx.doi.org/10.1609/AAAI.V34I03.5652］

Plizzari C， Cannici M and Matteucci M. 2021. Skeleton-based action recognition via spatial and temporal transformer networks. Computer Vision and Image Understanding， 208-209： #103219 ［DOI： 10.1016/j.cviu.2021.103219http://dx.doi.org/10.1016/j.cviu.2021.103219］

Qin Z Y， Liu Y， Ji P， Kim D， Wang L， McKay R I， Anwar S and Gedeon T. 2022. Fusing higher-order features in graph neural networks for skeleton-based action recognition ［EB/OL］. ［2022-12-22］. https://arxiv.org/pdf/2105.01563.pdfhttps://arxiv.org/pdf/2105.01563.pdf

Qiu H L， Hou B， Ren B and Zhang X H. 2022. Spatio-temporal tuples transformer for skeleton-based action recognition ［EB/OL］. ［2022-12-22］. https://arxiv.org/pdf/2201.02849.pdfhttps://arxiv.org/pdf/2201.02849.pdf

Ren B， Liu M Y， Ding R W and Liu H. 2020. A survey on 3d skeleton-based action recognition using learning method ［EB/OL］. ［2022-12-22］. https://arxiv.org/pdf/2002.05907.pdfhttps://arxiv.org/pdf/2002.05907.pdf

Shahroudy A， Liu J， Ng T T and Wang G. 2016. NTU RGB + D： a large scale dataset for 3D human activity analysis//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 1010-1019 ［DOI： 10.1109/CVPR.2016.115http://dx.doi.org/10.1109/CVPR.2016.115］

Shi L， Zhang Y F， Cheng J and Lu H Q. 2019a. Two-stream adaptive graph convolutional networks for skeleton-based action recognition//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 12018-12027 ［DOI： 10.1109/CVPR.2019.01230http://dx.doi.org/10.1109/CVPR.2019.01230］

Shi L， Zhang Y F， Cheng J and Lu H Q. 2019b. Skeleton-based action recognition with directed graph neural networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 7904-7913 ［DOI： 10.1109/CVPR.2019.00810http://dx.doi.org/10.1109/CVPR.2019.00810］

Si C Y， Chen W T， Wang W， Wang L and Tan T N. 2019. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 1227-1236 ［DOI： 10.1109/CVPR.2019.00132http://dx.doi.org/10.1109/CVPR.2019.00132］

Song S J， Lan C L， Xing J L， Zeng W J and Liu J Y. 2016. An end-to-end spatio-temporal attention model for human action recognition from skeleton data ［EB/OL］. ［2022-12-22］. https://arxiv.org/pdf/1611.06067.pdfhttps://arxiv.org/pdf/1611.06067.pdf

Song Y F， Zhang Z， Shan C F and Wang L. 2020. Stronger， faster and more explainable： a graph convolutional baseline for skeleton-based action recognition//Proceedings of the 28th ACM International Conference on Multimedia. Seattle， USA： ACM： 1625-1633 ［DOI： 10.1145/3394171.3413802http://dx.doi.org/10.1145/3394171.3413802］

Song Y F， Zhang Z， Shan C F and Wang L. 2023. Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence， 45（2）： 1474-1488 ［DOI： 10.1109/TPAMI.2022.3157033http://dx.doi.org/10.1109/TPAMI.2022.3157033］

Song Y F， Zhang Z and Wang L. 2019. Richly activated graph convolutional network for action recognition with incomplete skeletons//Proceedings of 2019 IEEE International Conference on Image Processing. Taipei， China： IEEE： #8802917 ［DOI： 10.1109/ICIP.2019.8802917http://dx.doi.org/10.1109/ICIP.2019.8802917］

Thakkar K and Narayanan P J. 2018. Part-based graph convolutional network for action recognition ［EB/OL］. ［2022-12-22］. https://arxiv.org/pdf/1809.04983.pdfhttps://arxiv.org/pdf/1809.04983.pdf

Trivedi N， Thatipelli A and Sarvadevabhatla R K. 2021. NTU-X： an enhanced large-scale dataset for improving pose-based recognition of subtle human actions//Proceedings of the 12th Indian Conference on Computer Vision， Graphics and Image Processing. Jodhpur， India： ACM： #13 ［DOI： 10.1145/3490035.3490270http://dx.doi.org/10.1145/3490035.3490270］

Turaga P， Chellappa R， Subrahmanian V S and Udrea O. 2008. Machine recognition of human activities： a survey. IEEE Transactions on Circuits and Systems for Video Technology， 18（11）： 1473-1488 ［DOI： 10.1109/TCSVT.2008.2005594http://dx.doi.org/10.1109/TCSVT.2008.2005594］

Vemulapalli R， Arrate F and Chellappa R. 2014. Human action recognition by representing 3D skeletons as points in a lie group//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， USA： IEEE： 588-595 ［DOI： 10.1109/CVPR.2014.82http://dx.doi.org/10.1109/CVPR.2014.82］

Wang H S and Wang L. 2017. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 3633-3642 ［DOI： 10.1109/CVPR.2017.387http://dx.doi.org/10.1109/CVPR.2017.387］

Wang J， Liu Z C， Wu Y and Yuan J S. 2012. Mining actionlet ensemble for action recognition with depth cameras//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence， USA： IEEE： 1290-1297 ［DOI： 10.1109/CVPR.2012.6247813http://dx.doi.org/10.1109/CVPR.2012.6247813］

Wang J， Nie X H， Xia Y， Wu Y and Zhu S C. 2014. Cross-view action modeling， learning， and recognition//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， USA： IEEE： 2649-2656 ［DOI： 10.1109/CVPR.2014.339http://dx.doi.org/10.1109/CVPR.2014.339］

Wang P C， Li Z Y， Hou Y H and Li W Q. 2016. Action recognition based on joint trajectory maps using convolutional neural networks//Proceedings of the 24th ACM International Conference on Multimedia. Amsterdam， the Netherlands： ACM： 102-106 ［DOI： 10.1145/2964284.2967191http://dx.doi.org/10.1145/2964284.2967191］

Wang S C， Huang Q， Zhang Y F， Li X， Nie Y Q and Luo G C. 2022. Review of action recognition based on multimodal data. Journal of Image and Graphics， 27（11）： 3139-3159

王帅琛，黄倩，张云飞，李兴，聂云清，雒国萃. 2022. 多模态数据的行为识别综述. 中国图象图形学报， 27（11）： 3139-3159 ［DOI： 10.11834/jig.210786http://dx.doi.org/10.11834/jig.210786］

Wu C， Wu X J and Kittler J. 2019. Spatial residual layer and dense connection block enhanced spatial temporal graph convolutional network for skeleton-based action recognition//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul， Korea （South）： IEEE： 1740-1748 ［DOI： 10.1109/ICCVW.2019.00216http://dx.doi.org/10.1109/ICCVW.2019.00216］

Xia L， Chen C C and Aggarwal J K. 2012. View invariant human action recognition using histograms of 3D joints//Proceedings of 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Providence， USA： IEEE： 20-27 ［DOI： 10.1109/CVPRW.2012.6239233http://dx.doi.org/10.1109/CVPRW.2012.6239233］

Yan S J， Xiong Y J and Lin D H. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the AAAI Conference on Artificial Intelligence， 32（1）： 7444-7452 ［doi： 10.1609/aaai.v32i1.12328http://dx.doi.org/10.1609/aaai.v32i1.12328］

Yang X D and Tian Y L. 2012. EigenJoints-based action recognition using naive-Bayes-nearest-neighbor//Proceedings of 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Providence， USA： IEEE： 14-19 ［DOI： 10.1109/CVPRW.2012.6239232http://dx.doi.org/10.1109/CVPRW.2012.6239232］

Yang X D and Tian Y L. 2014. Effective 3D action recognition using EigenJoints. Journal of Visual Communication and Image Representation， 25（1）： 2-11 ［DOI： 10.1016/j.jvcir.2013.03.001http://dx.doi.org/10.1016/j.jvcir.2013.03.001］

Yang Z Y， Li Y C， Yang J C and Luo J B. 2019. Action recognition with spatio-temporal visual attention on skeleton image sequences. IEEE Transactions on Circuits and Systems for Video Technology， 29（8）： 2405-2415 ［DOI： 10.1109/TCSVT.2018.2864148http://dx.doi.org/10.1109/TCSVT.2018.2864148］

Yoon Y， Yu J and Jeon M. 2022. Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition. Applied Intelligence， 52（3）： 2317-2331 ［DOI： 10.1007/s10489-021-02487-zhttp://dx.doi.org/10.1007/s10489-021-02487-z］

Zhang P F， Lan C L， Xing J L， Zeng W J， Xue J R and Zheng N N. 2017. View adaptive recurrent neural networks for high performance human action recognition from skeleton data//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 2136-2145 ［DOI： 10.1109/ICCV.2017.233http://dx.doi.org/10.1109/ICCV.2017.233］

Zhang P F， Lan C L， Xing J L， Zeng W J， Xue J R and Zheng N N. 2019. View adaptive neural networks for high performance skeleton based human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence， 41（8）： 1963-1978 ［DOI： 10.1109/TPAMI.2019.2896631http://dx.doi.org/10.1109/TPAMI.2019.2896631］

Zhang P F， Lan C L， Zeng W J， Xing J L， Xue J R and Zheng N N. 2020. Semantics-guided neural networks for efficient skeleton-based human action recognition//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 1109-1118 ［DOI： 10.1109/CVPR42600.2020.00119http://dx.doi.org/10.1109/CVPR42600.2020.00119］

Zhang X Y， Zhou X Y， Lin M X and Sun J. 2018. ShuffleNet： an extremely efficient convolutional neural network for mobile devices//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 6848-6856 ［DOI： 10.1109/CVPR.2018.00716http://dx.doi.org/10.1109/CVPR.2018.00716］

Ziaeefard M and Ebrahimnezhad H. 2010. Hierarchical human action recognition by normalized-polar histogram//Proceedings of the 20th International Conference on Pattern Recognition. Istanbul， Turkey： IEEE： 3720-3723 ［DOI： 10.1109/ICPR.2010.906http://dx.doi.org/10.1109/ICPR.2010.906］

Alert me when the article has been cited

提交

Review of action recognition based on multimodal data

Nonlocal based deep model for group activity recognition

Action recognition for intelligent monitoring

Survey of digital face rendering and appearance recovery methods

Comprehensive review of methods for vehicle logo recognition in intelligent transportation systems