面向同胚异构骨骼运动重定向的高阶图卷积网络
A high-order graph convolutional network for homomorphic and heterogeneous skeletal motion retargeting
- 2024年29卷第12期 页码:3712-3726
纸质出版日期: 2024-12-16
DOI: 10.11834/jig.230909
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-12-16 ,
移动端阅览
贾伟, 李骏, 李书杰, 赵洋, 闵海. 2024. 面向同胚异构骨骼运动重定向的高阶图卷积网络. 中国图象图形学报, 29(12):3712-3726
Jia Wei, Li Jun, Li Shujie, Zhao Yang, Min Hai. 2024. A high-order graph convolutional network for homomorphic and heterogeneous skeletal motion retargeting. Journal of Image and Graphics, 29(12):3712-3726
目的
2
骨骼运动重定向是指将源角色的骨骼运动数据,修改后运用到另一个具有不同骨架结构的目标角色上,使得目标角色和源角色做出相同的动作。由于骨骼运动数据与骨架结构之间具有高耦合性,重定向算法需要从运动数据中分离出与骨架结构无关、只表示动作类型的特征。当源角色与目标角色骨架结构不同,且两者运动模式(如关节角变化范围)存在较大差异时,特征分离难度加大,重定向网络训练难度变大。针对该问题,提出了特征分离的方法和高阶骨骼卷积算子。
方法
2
在数据处理阶段,首先从运动数据中分离出一部分与骨架结构无关的特征,从而降低重定向网络训练难度,得到更好的重定向结果。另外,结合图卷积网络,本文针对人体骨架结构提出了高阶骨骼卷积算子。使用该算子,本文网络模型可以捕获更多有关骨架结构的信息,提高重定向结果的精度和视觉效果。
结果
2
在异构重定向任务中,本文方法在合成动画数据集Mixamo上与最新方法对比,重定向结果精度提升了38.6%。另外,本文方法也同样适用于同构重定向,结果精度比最新方法提升了74.8%。在从真人采集的运动数据到虚拟动画角色的异构重定向任务中,相比最新方法,本文方法能够明显减少重定向错误,重定向结果有更高的视觉质量。
结论
2
相比较于目前最新的方法,本文方法降低了特征分离的难度且更加充分挖掘了骨架的结构信息,使得重定向结果误差更低且动作更自然合理。
Objective
2
Skeletal motion retargeting is a key technology that involves adapting skeletal motion data from a source character, after suitable modification, to a target character with a different skeleton structure, thereby ensuring that the target character performs actions identical to the source. This process, which is particularly crucial in animation production and game development, can greatly promote the reuse of existing motion data and significantly reduce the need to create new motion data from scratch. Skeletal motion data have an inherently strong relationship with a character’s skeleton structure, and the core challenge in retargeting lies in extracting motion data features that are independent of the source skeleton and solely embody the essence and pattern of the action. The complexity in this process increases markedly during practical applications, especially when the source and target characters stem from distinct datasets (e.g., translating motion capture data from real human subjects onto virtual animated characters with heterogeneous skeletal structures). The differences between such datasets extend beyond mere skeletal disparities and may encompass inconsistencies in capturing equipment, physiological variations among individuals, and diverse action execution environments. Collectively, these factors produce significant discrepancies between the source and target characters in terms of global movement ranges, joint angle variation range, and other motion attributes, thus posing formidable challenges for retargeting algorithms. This paper addresses the problem of overcoming data heterogeneity to enable a precise motion retargeting from real human motion data to heterogeneous yet topologically equivalent virtual animated characters. To this end, this paper proposes several strategies for feature separation and high-order skeletal convolution operators.
Method
2
During the data preprocessing stage, feature separation is applied on the motion data to isolate those components that are independent of the skeletal structure. This approach significantly reduces the complexity of the data and consequently reduce the difficulty of the heterogeneous retargeting task and facilitate the attainment of superior retargeting outcomes. Moreover, given the high sensitivity of motion retargeting tasks to local features, this paper delves into the distance information between joints and, in conjunction with higher-order graph convolution theory, introduces innovative improvements to conventional skeletal convolution methods, ultimately proposing a novel high-order skeletal convolution operator. In high-order graph convolutional operations, the employed adjacency matrices of higher powers encapsulate a more abundant and tangible information profile. These matrices not only encompass fundamental structural information, i.e., direct adjacencies between nodes, but may also be extended to embody the multi-level distance characteristics among nodes. This new operator harnesses the rich adjacency relationships and distance information encapsulated within higher-order adjacency matrices, thereby enabling convolution operations to thoroughly and comprehensively extract the intrinsic structural features of the skeleton and enhancing the accuracy and visual effect of the retargeting results.
Result
2
In the heterogeneous motion retargeting task, the proposed algorithm demonstrates a significant improvement (38.6%) in retargeting accuracy compared with the current state-of-the-art methods when evaluated using the synthetic animation dataset Mixamo. To further understand the model’s characteristics, the root joint errors are examined to examine its precision in handling root joint position. Results show that relative to extant methods, the proposed algorithm reduces the root joint position errors by 35.5%, hence substantiating its exceptional capability in addressing retargeting tasks with large ranges of root joint position variations. This algorithm also demonstrates its applicability and superiority in homogeneous motion retargeting tasks, achieving a 74.8% higher accuracy compared with extant methods. In practical applications, when applying real-world motion data captured from humans to the retargeting of virtual animated characters in aheterogeneous context, our algorithm excels at delivering high levels of authenticity in reproducing specific actions and significantly reducing retargeting errors.
Conclusion
2
This paper presents a framework that is capable of handling challenging motion retargeting tasks between heterogeneous yet topologically equivalent skeletons. When the training data originate from two significantly diverse datasets, the proposed data preprocessing methods and high-order skeletal convolutional operators enable the neural network models to effectively extract motion features from the source data and integrate them into the target skeleton, thereby generating skeletal motion data for the target character. By separating features of the motion data that are independent of the skeleton structure, the proposed model can focus on structure-relevant information, thereby effectively decoupling motion information from structural details and achieving motion retargeting. Additionally, by assigning different weights to joints at varying distances, the high-order skeletal convolutional operators gather enhanced skeletal structural information to improve network performance.
深度学习运动重定向图卷积自编码器Human3.6M运动数据
deep learningmotion retargetinggraph convolutional networkautoencoderHuman3.6M motion data
Aberman K, Li P Z, Lischinski D, Sorkine-Hornung O, Cohen-Or D and Chen B Q. 2020. Skeleton-aware networks for deep motion retargeting. ACM Transactions on Graphics (TOG), 39(4): #62 [DOI: 10.1145/3386569.3392462http://dx.doi.org/10.1145/3386569.3392462]
Aberman K, Wu R D, Lischinski D, Chen B Q and Cohen-Or D. 2019. Learning character-agnostic motion for motion retargeting in2D. ACM Transactions on Graphics, 38(4): #75 [DOI: 10.1145/3306346.3322999http://dx.doi.org/10.1145/3306346.3322999]
Abu-El-Haija S, Perozzi B, Kapoor A, Alipourfard N, Lerman K, Harutyunyan H, Ver Steeg G and Galstyan A. 2019. MixHop: higher-order graph convolutional architectures via sparsified neighborhood mixing//Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: [s.n.]: 21-29
Bai Z Y, Ding Q C, Xu H L and Wu C D. 2023. Human similar action recognition by fusing saliency image semantic features. Journal of Image and Graphics, 28(9): 2872-2886
白忠玉, 丁其川, 徐红丽, 吴成东. 2023. 融合显著性图像语义特征的人体相似动作识别. 中国图象图形学报, 28(9): 2872-2886 [DOI: 10.11834/jig.220028http://dx.doi.org/10.11834/jig.220028]
Cheng K Y, Wu J X, Wang W S, Rong L and Zhan Y Z. 2021. Multi-person interaction action recognition based on spatio-temporal graph convolution. Journal of Image and Graphics, 26(7): 1681-1691
成科扬, 吴金霞, 王文杉, 荣兰, 詹永照. 2021. 融合时空图卷积的多人交互行为识别. 中国图象图形学报, 26(7): 1681-1691 [DOI: 10.11834/jig.200510http://dx.doi.org/10.11834/jig.200510]
Choi K J and Ko H S. 1999. On-line motion retargetting//Proceedings of the 7th Pacific Conference on Computer Graphics and Applications (Cat. No. PR00293). Seoul, Korea(South): IEEE: 32-42 [DOI: 10.1109/PCCGA.1999.803346http://dx.doi.org/10.1109/PCCGA.1999.803346]
Delhaisse B, Esteban D, Rozo L and Caldwell D. 2017. Transfer learning of shared latent spaces between robots with similar kinematic structure//Proceedings of 2017 International Joint Conference on Neural Networks (IJCNN). Anchorage, USA: IEEE: 4142-4149 [DOI: 10.1109/IJCNN.2017.7966379http://dx.doi.org/10.1109/IJCNN.2017.7966379]
Feng A, Huang Y Z, Xu Y Y and Shapiro A. 2012. Automating the transfer of a generic set of behaviors onto a virtual character//Proceedings of the 5th International Conference on Motion in Games. Rennes, France: Springer: 134-145 [DOI: 10.1007/978-3-642-34710-8_13http://dx.doi.org/10.1007/978-3-642-34710-8_13]
Gleicher M. 1998. Retargetting motion to new characters//Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques. New York, USA: Association for Computing Machinery: 33-42 [DOI: 10.1145/280814.280820http://dx.doi.org/10.1145/280814.280820]
Hoshyari S, Xu H Y, Knoop E, Coros S and Bächer M. 2019. Vibration-minimizing motion retargeting for robotic characters. Acm Transactions on Graphics, 38(4): #102 [DOI: 10.1145/3306346.3323034http://dx.doi.org/10.1145/3306346.3323034]
Ionescu C, Papava D, Olaru V and Sminchisescu C. 2014. Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7): 1325-1339 [DOI: 10.1109/TPAMI.2013.248http://dx.doi.org/10.1109/TPAMI.2013.248]
Kang N, Bai J X, Pan J J and Qin H. 2019. Real-time animation and motion retargeting of virtual characters based on single RGB-D camera//Proceedings of 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). Osaka, Japan: IEEE: 1006-1007 [DOI: 10.1109/VR.2019.8797856http://dx.doi.org/10.1109/VR.2019.8797856]
Kim Y, Park H, Bang S and Lee S H. 2016. Retargeting human-object interaction to virtual avatars. IEEE Transactions on Visualization and Computer Graphics, 22(11): 2405-2412 [DOI: 10.1109/TVCG.2016.2593780http://dx.doi.org/10.1109/TVCG.2016.2593780]
Lee J and Shin S Y. 1999. A hierarchical approach to interactive motion editing for human-like figures//Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques. USA: ACM Press/Addison-Wesley Publishing Co.: 39-48 [DOI: 10.1145/311535.311539http://dx.doi.org/10.1145/311535.311539]
Li C L, Cui Z, Zheng W M, Xu C Y and Yang J. 2018. Spatio-temporal graph convolution for skeleton based action recognition//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA: AAAI: 3482-3489 [DOI: 10.1609/aaai.v32i1.11776http://dx.doi.org/10.1609/aaai.v32i1.11776]
Li S J, Wang L, Jia W, Zhao Y and Zheng L P. 2022. An iterative solution for improving the generalization ability of unsupervised skeleton motion retargeting. Computers and Graphics, 104: 129-139 [DOI: 10.1016/j.cag.2022.04.001http://dx.doi.org/10.1016/j.cag.2022.04.001]
Lim J, Chang H J and Choi J Y. 2019. PMnet: learning of disentangled pose and movement for unsupervised motion retargeting//Proceedings of the 30th British Machine Vision Conference. Cardiff, United Kingdom: BMVA
Niepert M, Ahmed M and Kutzkov K. 2016. Learning convolutional neural networks for graphs//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, USA: JMLR.org: 2014-2023
Tak S and Ko H S. 2005. A physically-based motion retargeting filter. ACM Transactions on Graphics, 24(1): 98-117 [DOI: 10.1145/1037957.1037963http://dx.doi.org/10.1145/1037957.1037963]
Uk Kim S, Jang H and Kim J. 2020. A variational U-Net for motion retargeting. Computer Animation and Virtual Worlds, 31(4/5): #e1947 [DOI: 10.1002/cav.1947http://dx.doi.org/10.1002/cav.1947]
Villegas R, Yang J M, Ceylan D and Lee H. 2018. Neural kinematic networks for unsupervised motion retargetting//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8639-8648 [DOI: 10.1109/CVPR.2018.00901http://dx.doi.org/10.1109/CVPR.2018.00901]
Wu F, Souza A, Zhang T Y, Fifty C, Yu T and Weinberger K. 2019. Simplifying graph convolutional networks//Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: [s.n.]: 6861-6871
Yang Q S and Mu T J. 2022. Action recognition using ensembling of different distillation-trained spatial-temporal graph convolution models. Journal of Image and Graphics, 27(4): 1290-1301
杨清山, 穆太江. 2022. 采用蒸馏训练的时空图卷积动作识别融合模型. 中国图象图形学报, 27(4): 1290-1301 [DOI: 10.11834/jig.200791http://dx.doi.org/10.11834/jig.200791]
Zhang J X, Weng J W, Kang D, Zhao F, Huang S L, Zhe X F, Bao L C, Shan Y, Wang J and Tu Z G. 2023. Skinned motion retargeting with residual perception of motion semantics and geometry//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE: 13864-13872 [DOI: 10.1109/CVPR52729.2023.01332http://dx.doi.org/10.1109/CVPR52729.2023.01332]
Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2242-2251 [DOI: 10.1109/ICCV.2017.244http://dx.doi.org/10.1109/ICCV.2017.244]
相关作者
相关机构