面向步态识别的局部时空卷积网络
Local spatiotemporal convolutional network for gait recognition
- 2025年 页码:1-16
网络出版日期: 2025-01-23 ,
录用日期: 2025-01-16
DOI: 10.11834/jig.240710
移动端阅览
浏览全部资源
扫码关注微信
网络出版日期: 2025-01-23 ,
录用日期: 2025-01-16
移动端阅览
丁欣楠, 叶楠, 段鑫, 王科俊. 面向步态识别的局部时空卷积网络[J/OL]. 中国图象图形学报, 2025,1-16.
DING XINNAN, YE NAN, DUAN XIN, WANG KEJUN. Local spatiotemporal convolutional network for gait recognition. [J/OL]. Journal of image and graphics, 2025, 1-16.
目的
2
作为一种生物特征,步态可以通过行走时的步态习惯来区分目标身份。但由于视频数据的复杂性,隐藏在连续帧中的不受外部协变量干扰的运动状态很难被直接捕获。针对该问题,本文提出了一种基于局部时空卷积的步态识别方法来自主地学习步态运动模式。
方法
2
受分块思想的启发,提出了全局双向空间池化方法使步态张量降维,以局部条带为基本单位表示步态特征空间上的细节。并且设计了局部时空卷积层,使时域和空间域上的信息共同参与卷积运算,从而让二维卷积层能够自适应地学习基于条带的步态运动。并且通过非对称卷积分别关注时域、空间域和时空域,从而能够更好地提取步态时空特征。另外,提出了基于局部的时空池化方法,能够融合多帧中最有区分性的局部步态时空表征,以生成更具身份判别性的步态特征。
结果
2
所提出的方法在两个基准的公共数据集上的实验均取得了最高识别精度,在CASIA-B数据集上的三种行走条件下分别达到了97.3%、93.7%和83.8%的平均识别正确率,在OU-MVLP上取得了85.8%的平均识别结果,证明了所提方法的优越性。
结论
2
本文所提出的局部时空卷积网络的方法,有较好的时空域特征学习能力,能够提升步态识别准确率。
Objective With the development of technology and the expansion of application scenarios, biometric recognition technology is gradually becoming one of the mainstream technologies in the field of future security authentication. Gait is a biometric property that can be utilized to distinguish target objects based on their walking patterns. However, capturing the motion pattern hidden within a series of frames is challenging due to the complexity of video data. Existing gait recognition methods suffer from difficulties in learning gait motion habits under a wide range of recognition conditions and achieving excellent real-time performance, due to the influence of random and diverse external factors such as complex backgrounds, pedestrian clothing, walking directions, and illumination changes. To address this issue, we propose a gait recognition method based on the local spatiotemporal convolutional network to autonomously learn gait motion patterns.
Method
2
Unlike introducing manual motion features, this network directly endows the two-dimensional convolutional network with the ability to extract temporal information. It adaptively learns complex underlying structures and patterns in video data driven by data to capture action features, and can adapt to different gait data through continuous learning, making the model highly universal. Specifically, inspired by the partitioning method, a global bidirectional spatial pooling method was proposed to decrease the dimensionality of gait tensors, and local strips were employed as the fundamental units to describe the details in the gait space. By using global bidirectional spatial pooling, gait features are divided into horizontal and vertical local features in the spatial domain, and the method of partitioning is utilized to pay attention to gait details while reducing dimensionality. In this way, the local spatiotemporal convolutional layer was designed to integrate the spatial and temporal domains, allowing for adaptive learning of strip-based gait motion. The local spatiotemporal convolution attempts to involve the spatial domain, channel domain, and time domain in two-dimensional convolution operations. This novel layer allows both the temporal and spatial dimensions to participate in the learning of the convolutional network, enabling the two-dimensional convolutional structure to capture spatiotemporal features of gait. Asymmetric convolution can also be extended to local spatiotemporal convolution, constructing asymmetric local spatiotemporal convolution layers. Asymmetric convolution can explicitly enhance the representational power of standard square convolution kernels, and integrating asymmetric convolution can better extract spatial features. In addition, a local spatiotemporal pooling method can combine the discriminative local gait spatiotemporal representations from multiple frames to generate more discriminative gait features. By this means, the dimensionality reduction of the gait tensor is achieved, making it possible for the time-domain dimension to participate in the calculation of two-dimensional convolution, and the spatial representation details of gait are integrated, reducing the loss of spatial features.
Result
2
Extensive experiments on gait benchmark datasets have demonstrated the effectiveness of various parts of the designed network, and comparisons with other methods have also demonstrated the superiority of this approach. The experiments on two benchmark public datasets show that the proposed method outperforms other current gait recognition approaches. The method proposed in this article achieved the best recognition performance under three training settings on the CASIA-B dataset. It achieves average recognition accuracies of 97.3%, 93.7%, and 83.8% under three walking scenarios, respectively, and 85.8% on OU-MVLP. Moreover, the comparison with other recent methods proves the superiority of this method.
Conclusion
2
The experimental results show that the local spatio-temporal convolutional network has excellent spatio-temporal feature learning capacity and has a positive effect on improving gait recognition performance. These findings demonstrate that the local spatiotemporal convolutional network can effectively learn the spatiotemporal features and show the superiority of the proposed method, and have good spatiotemporal feature learning ability and has a positive effect on improving gait recognition performance. Therefore, the proposed local spatiotemporal convolutional network can adaptively capture spatiotemporal features of gait, providing a new method for research in the field of gait recognition.
步态识别时空特征卷积神经网络局部特征深度学习
gait recognitionspatiotemporal featuresconvolutional neural networklocal featuresdeep learning
Sun Z N,He R,Wang L,Kan M N,Feng J J,Zheng F,Zheng W S,Zuo W M,Kang W X,Deng W H,Zhang J,Han H,Shan S G,Wang Y L,Ru Y W,Zhu Y H,Liu Y F and He Y. 2021. Overview of biometrics research. Journal of Image and Graphics,26( 06) : 1254-1329
孙哲南,赫然,王亮,阚美娜,冯建江,郑方,郑伟诗,左旺孟,康文雄,邓伟洪,张杰,韩琥,山世光,王云龙,茹一伟,朱宇豪,刘云帆,何勇. 2021. 生物特征识别学科发展报告. 中国图象图形学报,26( 06) : 1254-1329 [DOI: 10. 11834 / jig. 210078http://dx.doi.org/10.11834/jig.210078]
Sepas-Moghaddam A, Etemad A. 2023. Deep gait recognition: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence,45(1):264-284.[DOI: 10.1109/TPAMI.2022.3151865http://dx.doi.org/10.1109/TPAMI.2022.3151865]
Yu S, Tan D, Tan T. 2006.A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition.18th International Conference on Pattern Recognition (ICPR'06) :IEEE,:441-444.[ DOI: 10.1109/ICPR.2006.67]
Ben X, Gong C, Zhang P, Jia X, Wu Q, Meng W. 2019.Coupled patch alignment for matching cross-view gaits. IEEE Transactions on Image Processing, 28(6):3142-3157. [DOI: 10.1109/TIP.2019.2894362http://dx.doi.org/10.1109/TIP.2019.2894362]
Lin B, Zhang S, Yu X. 2021.Gait recognition via effective global-local feature representation and local temporal aggregation.Proceedings of the IEEE/CVF International Conference on Computer Vision:14648-14656. [DOI: 10.1109/ICCV48922.2021.01438http://dx.doi.org/10.1109/ICCV48922.2021.01438]
Zhang S, Wang Y, Li A. 2021.Cross-view gait recognition with deep universal linear embeddings. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition:9095-9104.[ DOI: 10.1109/CVPR46437.2021.00898]
Ding X, Wang K, Wang C, Lan T, Liu L. 2021. Sequential convolutional network for behavioral pattern extraction in gait recognition. Neurocomputing, 463:411-421. [DOI: 10.1016/j.neucom.2021.08.054http://dx.doi.org/10.1016/j.neucom.2021.08.054]
Hou S H,Fu Y,Li A Q,Liu X,Cao C S and Huang Y Z. 2023. Multifaceted-features enhancement-relevant gait recognition method. Journal of Image and Graphics,28(05):1477-1486
侯赛辉,付杨,李奥奇,刘旭,曹春水,黄永祯 . 2023. 多部位特征增强的步态识别算法 . 中国图象图形学报,28(05):1477-1486[DOI:10. 11834/jig. 220641http://dx.doi.org/10.11834/jig.220641]
Chao H, He Y, Zhang J, Feng J. 2019. Gaitset: Regarding gait as a set for cross-view gait recognition. Proceedings of the AAAI conference on artificial intelligence.:8126-8133. [DOI: 10.1609/aaai.v33i01.33018126http://dx.doi.org/10.1609/aaai.v33i01.33018126]
Wu Z, Huang Y, Wang L, Wang X, Tan T, Intelligence M. 2016. A comprehensive study on cross-view gait based human identification with deep cnns. IEEE Transactions on Pattern Analysis and Machine Intelligence ,39(2):209-226. [DOI: 10.1109/TPAMI.2016.2545669http://dx.doi.org/10.1109/TPAMI.2016.2545669]
Han J, Bhanu B. 2005. Individual recognition using gait energy image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(2):316-322. [DOI: 10.1109/TPAMI.2006.38http://dx.doi.org/10.1109/TPAMI.2006.38]
Li H, Qiu Y, Zhao H, Zhan J, Chen R, Wei T, Huang Z. 2022. GaitSlice: A gait recognition model based on spatiotemporal slice features. Pattern Recognition, 124:108453. [DOI: 10.1016/j.patcog.2021.108453http://dx.doi.org/10.1016/j.patcog.2021.108453]
Fan C, Peng Y, Cao C, Liu X, Hou S, Chi J, Huang Y, Li Q, He Z. 2020. Gaitpart: Temporal part-based model for gait recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.:14225-14233. [DOI: 10.1109/CVPR42600.2020.01423http://dx.doi.org/10.1109/CVPR42600.2020.01423]
Sun K, Xiao B, Liu D, Wang J. 2019. Deep high-resolution representation learning for human pose estimation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.:5693-5703. [DOI: 10.1109/CVPR.2019.00584http://dx.doi.org/10.1109/CVPR.2019.00584]
Cao Z, Hidalgo G, Simon T, Wei S-E, Sheikh Y. 2019. OpenPose: Realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1):172-186. [DOI: 10.1109/TPAMI.2019.2929257http://dx.doi.org/10.1109/TPAMI.2019.2929257]
An W, Yu S, Makihara Y, Wu X, Xu C, Yu Y, Liao R, Yagi Y. 2020. Performance evaluation of model-based gait on multi-view very large population database with pose sequences. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2(4):421-430. [DOI: 10.1109/TBIOM.2020.3008862http://dx.doi.org/10.1109/TBIOM.2020.3008862]
Liao R, Cao C, Garcia E B, Yu S, Huang Y. 2017. Pose-based temporal-spatial network (PTSN) for gait recognition with carrying and clothing variations. Biometric Recognition: 12th Chinese Conference, CCBR 2017, Shenzhen, China, October 28-29, 2017, Proceedings 12. Springer,474-483. [DOI: 10.1007/978-3-319-69923-3_51http://dx.doi.org/10.1007/978-3-319-69923-3_51]
Teepe T, Khan A, Gilg J, Herzog F, Hörmann S, Rigoll G. 2021. Gaitgraph: Graph convolutional network for skeleton-based gait recognition. 2021 IEEE International Conference on Image Processing (ICIP). :IEEE,:2314-2318. [DOI: 10.1109/ICIP42928.2021.9506717http://dx.doi.org/10.1109/ICIP42928.2021.9506717]
Teepe T, Gilg J, Herzog F, Hörmann S, Rigoll G. 2022. Towards a deeper understanding of skeleton-based gait recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.:1569-1577. [DOI: 10.1109/CVPRW56347.2022.00163http://dx.doi.org/10.1109/CVPRW56347.2022.00163]
Li N, Zhao X. 2023. A Strong and Robust Skeleton-based Gait Recognition Method with Gait Periodicity Priors. IEEE Transactions on Multimedia: (25) 3046 - 3058. [DOI: 10.1109/TMM.2022.3154609http://dx.doi.org/10.1109/TMM.2022.3154609]
Sepas-Moghaddam A, Etemad A. 2021. View-Invariant Gait Recognition with Attentive Recurrent Learning of Partial Representations. IEEE Transactions on Biometrics, Behavior, and Identity Science,31:124-137. [DOI: 10.1109/TBIOM.2020.3031470http://dx.doi.org/10.1109/TBIOM.2020.3031470]
Zhang Z, Tran L, Liu F, Liu X. 2020. On learning disentangled representations for gait recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence,:345-360. [DOI: 10.1109/TPAMI.2020.2998790http://dx.doi.org/10.1109/TPAMI.2020.2998790]
Zhang Y, Huang Y, Yu S, Wang L. 2019.Cross-View Gait Recognition by Discriminative Feature Learning. IEEE Transactions on Image Processing, ,29:1001 - 1015. [DOI: 10.1109/TIP.2019.2926208http://dx.doi.org/10.1109/TIP.2019.2926208]
Sepas-Moghaddam A, Ghorbani S, Troje N F, Etemad A. 2020. Gait recognition using multi-scale partial representation transformation with capsules. 2020 25th International Conference on Pattern Recognition (ICPR). :IEEE,:8045-8052. [DOI: 10.1109/ICPR48806.2021.9412517http://dx.doi.org/10.1109/ICPR48806.2021.9412517]
Lin B, Zhang S, Bao F. 2020. Gait recognition with multiple-temporal-scale 3d convolutional neural network. Proceedings of the 28th ACM International conference on Multimedia.:3054- 3062. [DOI: 10.1145/3394171.3413861http://dx.doi.org/10.1145/3394171.3413861]
Huang Z, Xue D, Shen X, Tian X, Li H, Huang J, Hua X-S. 2021. 3D local convolutional neural networks for gait recognition. Proceedings of the IEEE/CVF International Conference on Computer Vision.:14920-14929. [DOI: 10.1109/ICCV48922.2021.01465http://dx.doi.org/10.1109/ICCV48922.2021.01465]
Ding X, Du S, Zhang Y, Wang K. 2024. Spatiotemporal multi-scale bilateral motion network for gait recognition. Journal of Supercomputing, 80: 3412-3440. [DOI: 10.1007/s11227-023-05607-3http://dx.doi.org/10.1007/s11227-023-05607-3]
He K, Zhang X, Ren S, Sun J. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, ,37(9):1904-1916. [DOI: 10.1109/TPAMI.2015.2389824http://dx.doi.org/10.1109/TPAMI.2015.2389824]
Balntas V, Riba E, Ponsa D, Mikolajczyk K. 2016.Learning local feature descriptors with triplets and shallow convolutional neural networks. British Machine Vision Conference (BMVC). [DOI: 10.5244/C.30.119http://dx.doi.org/10.5244/C.30.119]
Lin T-Y, Goyal P, Girshick R, He K, Dollár P. 2017.Focal loss for dense object detection. Proceedings of the IEEE international conference on computer vision.:2980-2988. [DOI: 10.1109/TPAMI.2018.2858826http://dx.doi.org/10.1109/TPAMI.2018.2858826]
Wang K, Liu L, Ding X, Yu K, Hu G. 2021. A partition approach for robust gait recognition based on gait template fusion. Frontiers of Information Technology and Electronic Engineering, ,22(5):709-719. [DOI: 10.1631/FITEE.2000377http://dx.doi.org/10.1631/FITEE.2000377]
Ma G, Wu L, Wang Y. 2017.A general subspace ensemble learning framework via totally-corrective boosting and tensor-based and local patch-based extensions for gait recognition. Pattern Recognition, ,66:280-294. [DOI: 10.1016/j.patcog.2017.01.003http://dx.doi.org/10.1016/j.patcog.2017.01.003]
Ma G, Wu L, Wang Y. 2017. A general subspace ensemble learning framework via totally-corrective boosting and tensor-based and local patch-based extensions for gait recognition. Pattern Recognition, ,66:280-294. [DOI: 10.1016/j.patcog.2017.01.003http://dx.doi.org/10.1016/j.patcog.2017.01.003]
Wang M, Lin B, Guo X, Li L, Zhu Z, Sun J, Zhang S, Liu Y, Yu X. 2022. GaitStrip: Gait Recognition via Effective Strip-based Feature Representations and Multi-Level Framework. Proceedings of the Asian Conference on Computer Vision.:536-551. [DOI: 10.1007/978-3-031-26316-3_42http://dx.doi.org/10.1007/978-3-031-26316-3_42]
Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S. 2014. CNN features off-the-shelf: an astounding baseline for recognition. Proceedings of the IEEE conference on computer vision and pattern recognition workshops.806-813. [DOI: 10.1109/CVPRW.2014.131http://dx.doi.org/10.1109/CVPRW.2014.131]
Ding X, Guo Y, Ding G, Han J. 2019. Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks. Proceedings of the IEEE/CVF international conference on computer vision.1911-1920. [DOI: 10.1109/ICCV.2019.00200http://dx.doi.org/10.1109/ICCV.2019.00200]
Radenović F, Tolias G, Chum O. 2018. Fine-tuning CNN image retrieval with no human annotation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(7):1655-1668. [DOI: 10.1109/TPAMI.2018.2846566http://dx.doi.org/10.1109/TPAMI.2018.2846566]
相关作者
相关机构