空—地多视角行为识别的判别信息增量学习方法
Discriminative information incremental learning for Air-ground multi-view action recognition
- 2025年30卷第1期 页码:130-147
纸质出版日期: 2025-01-16
DOI: 10.11834/jig.230815
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2025-01-16 ,
移动端阅览
刘文璇, 钟忺, 徐晓玉, 周卓, 江奎, 王正, 白翔. 空—地多视角行为识别的判别信息增量学习方法[J]. 中国图象图形学报, 2025,30(1):130-147.
LIU WENXUAN, ZHONG XIAN, XU XIAOYU, ZHOU ZHUO, JIANG KUI, WANG ZHENG, BAI XIANG. Discriminative information incremental learning for Air-ground multi-view action recognition. [J]. Journal of image and graphics, 2025, 30(1): 130-147.
目的
2
随着社会民生需求的日益增长,地面设备联合无人机等空中设备进行空—地多视角场景下的行为识别应运而生。现有方法只关注水平空间视角变化时的视角关系,忽视了垂直空间变化带来的巨大判别行为差异。由于高度不同,同一物体的观测特征差异显著,对传统的多视角动作识别方法在应对垂直空间视角变化时构成了重大挑战。
方法
2
本文将显著的动作外观差异定义为辨别动作信息的差异,提出了一种基于判别行为信息增量学习(discriminative action information incremental learning,DAIL)的空—地场景多视角行为识别方法,根据视角高度和信息量判别地面视角和空中视角。引入类脑“由易到难,循序渐进”的思想,分别蒸馏不同视角的判别行为信息,将地面视角(简单)样本的判别行为信息增量到空中视角(困难)样本中,辅助网络学习空中视角样本。
结果
2
在Drone-Action和UAV(unmanned aerial vehicle)两个数据集上进行了实验,对比于当前先进方法SBP(stochastic backpropagation),在两个数据集上准确率分别提高了18.0%和16.2%;对比于强基线方法,本文方法在UAV数据集上参数量减少了2.4 M,计算量减少了6.9 G。
结论
2
所提出的方法表明通过使用简单样本增强复杂样本,显著提高了网络的特征学习能力。相反,尝试反向操作会导致准确性下降。本文从新角度讨论多视角行为识别任务,兼具效果和性能,在常见的高空视角数据集中优于代表性的方法且可拓展到其他多视角任务。
Objective
2
With the increasing demand for urban security of people, ground devices are combined with air devices, such as drones, for identifying action in air-ground scenarios. Meanwhile, the extensive ground-based camera networks and a wealth of ground surveillance data can offer reliable support to these aerial surveillance devices. How to effectively utilize the mobility of these aerial devices is a topic that warrants further research. Existing multi-view action recognition methods focus only on the difference in discriminative action information when the horizontal spatial view changes, but do not consider the difference in discriminative action information when the vertical spatial view changes. The high mobility of aerial perspectives can lead to changes in the vertical spatial perspective. According to the principles of perspective, observing the same object from different heights results in a significant change in appearance. This, in turn, causes substantial differences in the appearance of the same person’s actions when observed from high-altitude and ground-level perspectives. These significant variations in action appearance are referred to as differences in discriminative action information, and they pose a challenge for traditional multi-view action recognition methods in effectively addressing the issue of vertical spatial perspective changes.
Method
2
When the viewing perspective aligns with the objects being observed in the same horizontal spatial plane, the most comprehensive and rich discriminative action information can be observed. Networks can easily learn and comprehend this information. However, when the viewing perspective is in a different horizontal spatial plane from the observed objects, inclined perspective occurs, resulting in a significant change in action appearance. This transition from a ground-level perspective to an aerial perspective leads to insufficiently observed information and a reduction in the discriminative action information. When networks attempt to learn and understand this information, misclassifications are more likely to occur. Therefore, on the basis of the amount of discriminative action information, ground-level perspective information can be considered as easily learned and understood simple information, while aerial perspective information can be seen as complex information that is challenging to learn and understand
.
In fact, the human brai
n follows a progressive learning process when dealing with various types of information, prioritizing the processing of simple information and using the learned simple information to assist in learning complex information. In the vertical spatial multi-view action recognition task, differences in perspectives and environmental influences lead to varying amounts of discriminative action information observed at different heights. In this chapter, we adopt a brain-like approach. We rank samples from the aerial perspective on the basis of the amount of discriminative action information they contain. Complex samples contain less discriminative action information, and networks find them challenging to learn and understand. Simple samples contain more discriminative action information and are easier for networks to learn and comprehend. We then distill discriminative action information separately from simple and complex samples. Within the same action category, despite differences in the amount of discriminative action information between simple and complex samples, the represented action categories should have commonalities. Therefore, by using the discriminative action information incremental learning method, we incrementally inject the rich discriminative action information learned from simple samples into the feature information of complex samples. This approach addresses the issue of complex samples carrying insufficient discriminative action information, allowing complex samples to learn more discriminative action information with the assistance of simple samples. Thus, networks can learn and understand complex samples easily. This paper proposes a discriminative action information incremental learning (DAIL) for multi-view action recognition in complex air-ground scenes and to distinguish the ground view from the air view on the basis of the view height and the amount of information. This paper utilizes a neuromorphic learning knowledge referred to as “ordered incremental progression” to distill discriminative ac
tion information for different views separately. Discriminative action information is incremented from the ground-view (simple) samples into the air-view (complex) samples to assist the network in learning and understanding the air-view samples.
Result
2
The method is experimentally validated on two datasets, namely, Drone-Action and unmanned aerial vehicle(UAV). The accuracy of the two datasets is improved by 18.0% and 16.2%, respectively, compared with that of the current state-of-the-art method SBP. Compared with the strong baseline method, our method reduces the parameters by 2.4 M and the FLOPS by 6.9 G on the UAV dataset. To validate the effectiveness of our proposed method in scenarios involving both ground-level and aerial perspectives, we introduced two datasets: N-UCLA (comprising samples exclusively from ground-based cameras with rich discriminative behavior information) and Drone-Action (comprising a mix of ground-level and aerial samples, where aerial samples contain relatively limited discriminative behavior information). A joint analysis of discriminative behavior information ranking was conducted on these datasets. Our findings indicate that enhancing complex samples using simpler ones significantly improves the network’s feature learning capacity. Conversely, attempting the reverse can lead reduce the accuracy. This observation aligns with the way the human brain processes information, embodying the concept of progressive learning.
Conclusion
2
In this study, we proposed DAIL for multi-view action recognition in complex air-ground scenes and to distinguish the ground view from the air view on the basis of the view height and the amount of information. Experiment results show that our model outperforms several state-of-the-art multi-view approaches and improves the performance.
多视角行为识别增量学习样本分类判别行为信息蒸馏学习
multi-view action recognitionincremental learningsample classificationdiscriminative action informationdistillation learning
Bin Y R, Cao X, Chen X Y, Ge Y H, Tai Y, Wang C J, Li J L, Huang F Y, Gao C X and Sang N. 2020. Adversarial semantic data augmentation for human pose estimation//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 606-622 [DOI: 10.1007/978-3-030-58529-7_36http://dx.doi.org/10.1007/978-3-030-58529-7_36]
Carreira J and Zisserman A. 2017. Quo vadis, action recognition? A new model and the kinetics dataset//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 4724-4733 [DOI: 10.1109/CVPR.2017.502http://dx.doi.org/10.1109/CVPR.2017.502]
Chen P H, Yang H C, Chen K W and Chen Y S. 2020. MVSNet++: learning depth-based attention pyramid features for multi-view stereo. IEEE Transactions on Image Processing, 29: 7261-7273 [DOI: 10.1109/TIP.2020.3000611http://dx.doi.org/10.1109/TIP.2020.3000611]
Cheng F, Xu M Z, Xiong Y J, Chen H, Li X Y, Li W and Xia W. 2022. Stochastic backpropagation: a memory efficient strategy for training video models//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 8291-8300 [DOI: 10.1109/CVPR52688.2022.00812http://dx.doi.org/10.1109/CVPR52688.2022.00812]
Cheng K, Zhang Y F, He X Y, Chen W H, Cheng J and Lu H Q. 2020. Skeleton-based action recognition with shift graph convolutional network//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 180-189 [DOI: 10.1109/CVPR42600.2020.00026http://dx.doi.org/10.1109/CVPR42600.2020.00026]
Chéron G, Laptev I and Schmid C. 2015. P-CNN: pose-based CNN features for action recognition//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3218-3226 [DOI: 10.1109/ICCV.2015.368http://dx.doi.org/10.1109/ICCV.2015.368]
Dave I R, Chen C and Shah M. 2022. SPAct: self-supervised privacy preservation for action recognition//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 20132-20141 [DOI: 10.1109/CVPR52688.2022.01953http://dx.doi.org/10.1109/CVPR52688.2022.01953]
Fayyaz M, Bahrami E, Diba A, Noroozi M, Adeli E, van Gool L and Gall J. 2021. 3D CNNs with adaptive temporal feature resolutions//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 4729-4738 [DOI: 10.1109/CVPR46437.2021.00470http://dx.doi.org/10.1109/CVPR46437.2021.00470]
Feichtenhofer C. 2020. X3D: expanding architectures for efficient video recognition//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 200-210 [DOI: 10.1109/CVPR42600.2020.00028http://dx.doi.org/10.1109/CVPR42600.2020.00028]
Feichtenhofer C, Fan H Q, Malik J and He K M. 2019. SlowFast networks for video recognition//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 6201-6210 [DOI: 10.1109/ICCV.2019.00630http://dx.doi.org/10.1109/ICCV.2019.00630]
Geng Y, Han Z B, Zhang C Q and Hu Q H. 2021. Uncertainty-aware multi-view representation learning//Proceedings of the 35th AAAI Conference on Artificial Intelligence. Virtually: AAAI Press: 7545-7553 [DOI: 10.1609/AAAI.V35I9.16924http://dx.doi.org/10.1609/AAAI.V35I9.16924]
Hara K, Kataoka H and Satoh Y. 2018. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?//Proceedings of 2018 IEEE/CVF conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: 6546-6555 [DOI: 10.1109/CVPR.2018.00685http://dx.doi.org/10.1109/CVPR.2018.00685]
Hardy M. 2010. Pareto’s law. The Mathematical Intelligencer, 32(3): 38-43 [DOI: 10.1007/s00283-010-9159-2http://dx.doi.org/10.1007/s00283-010-9159-2]
Ho M K, Abel D, Correa C G, Littman M L, Cohen J D and Griffiths T L. 2022. People construct simplified mental representations to plan. Nature, 606(7912): 129-136 [DOI: 10.1038/s41586-022-04743-9http://dx.doi.org/10.1038/s41586-022-04743-9]
Jhuang H, Gall J, Zuffi S, Schmid C and Black M J. 2013. Towards understanding action recognition//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 3192-3199 [DOI: 10.1109/ICCV.2013.396http://dx.doi.org/10.1109/ICCV.2013.396]
Jøsang A. 2016. Subjective Logic: A Formalism for Reasoning under Uncertainty. Artificial Intelligence: Foundations, Theory, and Algorithms. Cham: Springer: 1-326 [DOI: 10.1007/978-3-319-42337-1http://dx.doi.org/10.1007/978-3-319-42337-1]
Kong Y, Ding Z M, Li J and Fu Y. 2017. Deeply learned view-invariant features for cross-view action recognition. IEEE Transactions on Image Processing, 26(6): 3028-3037 [DOI: 10.1109/TCSVT.2018.28-68123http://dx.doi.org/10.1109/TCSVT.2018.28-68123]
Li J F and Zhang F Y. 2015. Human behavior recognition based on directional weighting local space-time features. Journal of Image and Graphics, 20(3): 320-331
李俊峰, 张飞燕. 2015. 基于局部时空特征方向加权的人体行为识别. 中国图象图形学报, 20(3): 320-331 [DOI: 10.11834/jig.20150303http://dx.doi.org/10.11834/jig.20150303]
Li K, Wang Y, He Y, Li Y Z, Wang Y, Wang L M and Qiao Y. 2023a. UniFormerV2: spatiotemporal learning by arming image ViTs with video UniFormer//Proceedings of 2023 International Conference on Computer Vision. Paris, France: IEEE: 1-24 [DOI: 10.48550/arXiv.2211.09552http://dx.doi.org/10.48550/arXiv.2211.09552]
Li S, He X X, Song W F, Hao A M and Qin H. 2023b. Graph diffusion convolutional network for skeleton based semantic recognition of two-person actions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7): 8477-8493 [DOI: 10.1109/TPAMI.2023.3238411http://dx.doi.org/10.1109/TPAMI.2023.3238411]
Li T J, Liu J, Zhang W and Duan L Y. 2020. HARD-Net: hardness-aware discrimination network for 3D early activity prediction//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 420-436 [DOI: 10.1007/978-3-030-58621-8_25http://dx.doi.org/10.1007/978-3-030-58621-8_25]
Li T J, Liu J, Zhang W, Ni Y, Wang W Q and Li Z H. 2021. UAV-Human: a large benchmark for human behavior understanding with unmanned aerial vehicles//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 16261-16270 [DOI: 10.1109/CVPR46437.2021.01600http://dx.doi.org/10.1109/CVPR46437.2021.01600]
Lifshitz O and Wolf L. 2021. Sample selection for universal domain adaptation//Proceedings of the 35th AAAI Conference on Artificial Intelligence. Virtually: AAAI Press: 8592-8600 [DOI: 10.1609/AAAI.v35i10.17042http://dx.doi.org/10.1609/AAAI.v35i10.17042]
Lin J, Gan C, Wang K and Han S. 2022. TSM: temporal shift module for efficient and scalable video understanding on edge devices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(5): 2760-2774 [DOI: 10.1109/TPAMI.2020.3029799http://dx.doi.org/10.1109/TPAMI.2020.3029799]
Liu W X, Zhong X, Jia X M, Jiang K and Lin C W. 2022. Actor-aware alignment network for action recognition. IEEE Signal Processing Letters, 29: 2597-2601 [DOI: 10.1109/LSP.2022.3229646http://dx.doi.org/10.1109/LSP.2022.3229646]
Liu X L, Masana M, Herranz L, Van De Weijer J, Lopez A M and Bagdanov A D. 2018. Rotate your networks: better weight consolidation and less catastrophic forgetting//Proceedings of the 24th International Conference on Pattern Recognition. Beijing, China: IEEE: 2262-2268 [DOI: 10.1109/ICPR.2018.8545895http://dx.doi.org/10.1109/ICPR.2018.8545895]
Luo C, Zhao P, Chen C, Qiao B, Du C, Zhang H Y, Wu W, Cai S W, He B, Rajmohan S and Lin Q W. 2021. PULNS: positive-unlabeled learning with effective negative sample selector//Proceedings of the 35th AAAI Conference on Artificial Intelligence. Virtually: AAAI Press: 8784-8792 [DOI: 10.1609/AAAI.v35i10.17064http://dx.doi.org/10.1609/AAAI.v35i10.17064]
Mazzia V, Angarano S, Salvetti F, Angelini F and Chiaberge M. 2022. Action Transformer: a self-attention model for short-time pose-based human action recognition. Pattern Recognition, 124: #108487 [DOI: 10.1016/J.PATCOG.2021.108487http://dx.doi.org/10.1016/J.PATCOG.2021.108487]
Perera A G, Law Y W and Chahl J. 2019. Drone-Action: an outdoor recorded drone video dataset for action recognition. Drones, 3(4): #82 [DOI: 10.3390/DRONES3040082http://dx.doi.org/10.3390/DRONES3040082]
Perera A G, Law Y W, Ogunwa T T and Chahl J. 2020. A multiviewpoint outdoor dataset for human action recognition. IEEE Transactions on Human-Machine Systems, 50(5): 405-413 [DOI: 10.1109/THMS.2020.2971958http://dx.doi.org/10.1109/THMS.2020.2971958]
Rebuffi S A, Kolesnikov A, Sperl G and Lampert C H. 2017. iCaRL: incremental classifier and representation learning//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5533-5542 [DOI: 10.1109/CVPR.2017.587http://dx.doi.org/10.1109/CVPR.2017.587]
Shao Z P, Li Y F and Zhang H. 2021. Learning representations from skeletal self-similarities for cross-view action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 31(1): 160-174 [DOI: 10.1109/TCSVT.2020.2965574http://dx.doi.org/10.1109/TCSVT.2020.2965574]
Shi G G, Fu X Y, Cao C Z and Zha Z J. 2023. Alleviating spatial misalignment and motion interference for UAV-based video recognition//Proceedings of the 31st ACM International Conference on Multimedia. Ottawa, Canada: ACM: 193-202 [DOI: 10.1145/3581783.3611799http://dx.doi.org/10.1145/3581783.3611799]
Shi H Y, Hou Z J, Chao X and Zhong Z K. 2023. Multimodal spatial-temporal feature representation and its application in action recognition. Journal of Image and Graphics, 28(4): 1041-1055
施海勇, 侯振杰, 巢新, 钟卓锟. 2023. 多模态时空特征表示及其在行为识别中的应用. 中国图象图形学报, 28(4): 1041-1055 [DOI: 10.11834/jig.211217http://dx.doi.org/10.11834/jig.211217]
Shi L, Zhang Y F, Cheng J and Lu H Q. 2019a. Skeleton-based action recognition with directed graph neural networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7904-7913 [DOI: 10.1109/CVPR.2019.00810http://dx.doi.org/10.1109/CVPR.2019.00810]
Shi L, Zhang Y F, Cheng J and Lu H Q. 2019b. Two-stream adaptive graph convolutional networks for skeleton-based action recognition//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 12018-12027 [DOI: 10.1109/CVPR.2019.01230http://dx.doi.org/10.1109/CVPR.2019.01230]
Shin H, Lee J K, Kim J and Kim J. 2017. Continual learning with deep generative replay//Advances in 31st Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 2994-3003
Simonyan K and Zisserman A. 2014. Two-stream convolutional networks for action recognition in videos//Advances in 27th Neural Information Processing Systems. Montreal, Canada: Curran Associates Inc.: 568-576
Swayamdipta S, Schwartz R, Lourie N, Wang Y Z, Hajishirzi H, Smith N A and Choi Y. 2020. Dataset cartography: mapping and diagnosing datasets with training dynamics//Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. Online: ACL: 9275-9293 [DOI: 10.18653/v1/2020.emnlp-main.746http://dx.doi.org/10.18653/v1/2020.emnlp-main.746]
Tran D, Bourdev L, Fergus R, Torresani L and Paluri M. 2015. Learning spatiotemporal features with 3D convolutional networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 4489-4497 [DOI: 10.1109/ICCV.2015.510http://dx.doi.org/10.1109/ICCV.2015.510]
Ullah A, Muhammad K, Hussain T and Baik S W. 2021. Conflux LSTMs network: a novel approach for multi-view action recognition. Neurocomputing, 435: 321-329 [DOI: 10.1016/J.NEUCOM.20-19.12.151http://dx.doi.org/10.1016/J.NEUCOM.20-19.12.151]
Vyas S, Rawat Y S and Shah M. 2020. Multi-view action recognition using cross-view video prediction//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 427-444 [DOI: 10.1007/978-3-030-58583-9_26http://dx.doi.org/10.1007/978-3-030-58583-9_26]
Wang J, Nie X H, Xia Y, Wu Y and Zhu S C. 2014. Cross-view action modeling, learning, and recognition//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 2649-2656 [DOI: 10.1109/CVPR.2014.339http://dx.doi.org/10.1109/CVPR.2014.339]
Wang L M, Xiong Y J, Wang Z, Qiao Y, Lin D H, Tang X O and van Gool L. 2016. Temporal segment networks: towards good practices for deep action recognition//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 20-36 [DOI: 10.1007/978-3-319-46484-8_2http://dx.doi.org/10.1007/978-3-319-46484-8_2]
Wang Q, Sun G, Dong J H, Wang Q Q and Ding Z M. 2022. Continuous multi-view human action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 32(6): 3603-3614 [DOI: 10.1109/TCSVT.2021.3112214http://dx.doi.org/10.1109/TCSVT.2021.3112214]
Wei K, Deng C, Yang X and Li M S. 2021. Incremental embedding learning via zero-shot translation//Proceedings of the 35th AAAI Conference on Artificial Intelligence. Virtually: AAAI Press: 10254-10262 [DOI: 10.1609/AAAI.v35i11.17229http://dx.doi.org/10.1609/AAAI.v35i11.17229]
Wu W H, He D L, Lin T W, Li F, Gan C and Ding E R. 2021. MVFNet: multi-view fusion network for efficient video recognition//Proceedings of the 35th AAAI Conference on Artificial Intelligence. Virtually: AAAI Press: 2943-2951 [DOI: 10.1609/AAAI.v35i4.16401http://dx.doi.org/10.1609/AAAI.v35i4.16401]
Xiao Y, Chen J, Wang Y C, Cao Z G, Zhou J T and Bai X. 2019. Action recognition for depth video using multi-view dynamic images. Information Sciences, 480: 287-304 [DOI: 10.1016/J.INS.2018.12.050http://dx.doi.org/10.1016/J.INS.2018.12.050]
Xu C, Wu X, Li Y C, Jin Y N, Wang M M and Liu Y. 2021. Cross-modality online distillation for multi-view action recognition. Neurocomputing, 456: 384-393 [DOI: 10.1016/J.NEUCO-M.2021.05.077http://dx.doi.org/10.1016/J.NEUCO-M.2021.05.077]
Yan S J, Xiong Y J and Lin D H. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA: AAAI Press: 7444-7452 [DOI: 10.1609/AAAI.v32i1.12328http://dx.doi.org/10.1609/AAAI.v32i1.12328]
Yang Y, Zhuang Y T and Pan Y H. 2022. The review of visual knowledge: a new pivot for cross-media intelligence evolution. Journal of Image and Graphics, 27(9): 2574-2588
杨易, 庄越挺, 潘云鹤. 2022. 视觉知识: 跨媒体智能进化的新支点. 中国图象图形学报, 27(9): 2574-2588 [DOI: 10.11834/jig.211264http://dx.doi.org/10.11834/jig.211264]
Yu L, Twardowski B, Liu X L, Herranz L, Wang K, Cheng Y M, Jui S and Van De Weijer J. 2020. Semantic drift compensation for class-incremental learning//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 6980-6989 [DOI: 10.1109/CVPR42600.2020.00701http://dx.doi.org/10.1109/CVPR42600.2020.00701]
Zhang B B, Ge S Y, Wang Q L and Li P H. 2021. Multi-order information fusion method for human action recognition. Acta Automatica Sinica, 47(3): 609-619
张冰冰, 葛疏雨, 王旗龙, 李培华. 2021. 基于多阶信息融合的行为识别方法研究. 自动化学报, 47(3): 609-619 [DOI:10.16383/j.aas.2018.c180265http://dx.doi.org/10.16383/j.aas.2018.c180265]
Zhao L, Wang Y X, Zhao J P, Yuan L Z, Sun J J, Schroff F, Adam H, Peng X, Metaxas D and Liu T. 2021. Learning view-disentangled human pose representation by contrastive cross-view mutual information maximization//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 12788-12797 [DOI: 10.1109/CVPR46437.2021.01260http://dx.doi.org/10.1109/CVPR46437.2021.01260]
Zhong X, Zhou Z, Liu W X, Jiang K, Jia X M, Huang W X and Wang Z. 2022. VCD: view-constraint disentanglement for action recognition//Proceedings of 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Singapore: IEEE: 2170-2174 [DOI: 10.1109/ICASSP43922.2022.9747610http://dx.doi.org/10.1109/ICASSP43922.2022.9747610]
Zhou B and Li J F. 2020. Human action recognition combined with object detection. Acta Automatica Sinica, 46(9): 1961-1970
周波, 李俊峰. 2020. 结合目标检测的人体行为识别. 自动化学报, 46(9): 1961-1970 [DOI: 10.16383/j.aas.c180848http://dx.doi.org/10.16383/j.aas.c180848]
相关文章
相关作者
相关机构