情智兼备数字人与机器人研究进展
Research Advancements on Emotionally and Intellectually Integrated Digital Humans and Robotics
- 2025年 页码:1-21
收稿日期:2024-12-27,
修回日期:2025-02-25,
录用日期:2025-03-10,
网络出版日期:2025-03-12
DOI: 10.11834/jig.240780
移动端阅览
浏览全部资源
扫码关注微信
收稿日期:2024-12-27,
修回日期:2025-02-25,
录用日期:2025-03-10,
网络出版日期:2025-03-12,
移动端阅览
情智兼备数字人与机器人技术旨在开发具备情感理解和个性化响应能力的智能系统,这一方向逐渐成为学术界和社会各界的研究焦点。本文围绕脑认知驱动的情感机理、多模态情智大模型的融合与解译、个性化情感表征与动态计算以及可交互情绪化内容生成调控等四个方面,系统性地分析了情智兼备数字人与机器人技术的研究现状与进展。展望未来,情智兼备数字人与机器人将在医疗陪护、智能教育、心理健康等领域展现出广阔的应用前景,并将在提升人机交互的自然性、个性化服务以及用户体验方面发挥重要作用。
The development of emotionally intelligent digital humans and robotic technologies represents a significant advancement in contemporary research, focusing on the creation of systems capable of understanding and responding to human emotions in a nuanced manner. This paper systematically analyzes the current research status and advancements in four key areas: brain-cognition-driven emotional mechanisms, the integration and interpretation of multimodal emotional intelligence models, personalized emotional representation and dynamic computation, and the regulation of interactive emotional content generation. The brain-cognition-driven emotional mechanisms highlight the critical need to understand emotional characteristics and dynamic regulatory processes across various brain regions. Recent advances in neuroimaging technologies, such as functional magnetic resonance imaging (fMRI) and electroencephalography (EEG), have yielded deeper insights into the activation patterns of these areas in response to different emotional stimuli. For instance, the amygdala's association with fear responses contrasts with the prefrontal cortex’s role in emotional regulation and cognitive control, emphasizing the importance of identifying these unique functions for the development of accurate emotion recognition systems capable of real-time emotional state analysis. The integration and interpretation of multimodal emotional intelligence models are also essential for enhancing emotion recognition capabilities. The ability to synthesize information from diverse sources, including audio, video, text, and physiological signals, provides a more robust understanding of emotional expressions. By analyzing vocal tone alongside facial expressions and contextual text data, models can achieve superior accuracy in identifying a spectrum of emotions such as happiness, sadness, or anger. This paper delves into methodologies for aligning and fusing cross-modal emotional data, showcasing how techniques like deep learning and Transformer architectures address challenges related to differences in modal features and temporal synchronization. For example, ensuring that emotional cues from video and audio data are accurately aligned in time can significantly enhance the overall recognition process. The discussion further explores the application of large models, particularly their capabilities in transfer learning and self-supervised learning, which enable these systems to adapt to new emotional contexts with minimal additional training. Such adaptability not only improves the naturalness of emotional expressions but also addresses critical privacy concerns associated with processing emotional data. In addition to these foundational elements, the paper emphasizes personalized emotional representation and dynamic computation. By capturing individual emotional traits, such as those related to gender, age, cultural background, and personality type, models can create more accurate emotional profiles tailored to specific users. This individualized approach is particularly relevant in areas like mental health support, where a nuanced understanding of a user’s emotional landscape can significantly enhance intervention effectiveness. The integration of social relationships and environmental stimuli into emotional analysis is also discussed, highlighting how contextual factors influence emotional responses and lead to more appropriate system reactions. Hierarchical knowledge-guided technologies are highlighted as enabling systems to respond to complex emotional scenarios, fostering more nuanced and context-aware interactions. Additionally, adaptive dynamic modeling techniques for emotional states introduce temporal dimensions into emotion processing, allowing real-time adjustments that ensure responses remain relevant and sensitive to user needs. The regulation of interactive emotional content generation is another critical aspect of this review, aiming to develop intelligent systems that can understand and produce multimodal emotional content. Key components include constructing emotional spaces for precise emotion representation, which involves defining both discrete categories and continuous dimensions of emotional expression. This dual approach enhances the ability of systems to capture a wide range of emotional nuances, facilitating more accurate and relatable interactions. Furthermore, the paper examines controllable interaction technologies in emotional generation, particularly advancements in Generative Adversarial Networks (GANs) and diffusion models, which allow for the introduction of emotional conditions that guide content generation. This capability enhances the flexibility and relevance of emotional responses, enabling digital humans to adjust their expressions based on users’ emotional states for more engaging interactions. Utilizing multimodal reasoning is a crucial element of the discussion, as it leverages the inferential capabilities of multimodal large models to effectively align and generate cross-modal emotional information. This enriches generated content and ensures resonance with users across visual, auditory, and textual levels. The paper also addresses strategies for minimizing computational resources while maintaining content quality, essential for deploying these advanced systems in real-world applications where efficiency is paramount. In conclusion, emotionally intelligent digital humans signify a transformative advancement in human-computer interaction, with the potential to significantly enhance user engagement and satisfaction. By integrating high-fidelity digital reconstruction, controllable emotional expression, and intelligent interaction capabilities, these systems can facilitate more natural and effective user engagement. Future developments in this field are likely to focus on enhancing the realism of digital human interactions and improving the adaptability of emotional expressions based on user feedback. As technology continues to progress, the potential applications of emotionally intelligent digital humans and robotics will expand across various domains, including healthcare, where they can provide companionship and emotional support; education, where they can personalize learning experiences; and entertainment, where they can create immersive environments. Ultimately, these advancements promise enriched user experiences and deeper emotional connections, paving the way for a future where emotionally intelligent systems become integral to daily life.
Tao J H , Fan C H , Lian Z , Lyu Z , Shen Y , Liang S . 2024 . Development of multimodal sentiment recognition and understanding. . Journal of image and graphics , 29 ( 6 ): 1607 - 1627
陶建华 , 范存航 , 连政 , 吕钊 , 沈莹 , 梁山 . 多模态情感识别与理解发展现状及趋势 . 2024 . 中国图象图形学报 , 29 ( 6 ): 1607- 1627 [ DOI: 10.11834/jig.240017 http://dx.doi.org/10.11834/jig.240017 ]
Berridge C , Zhou Y J , Robillard J M , Kaye J . 2023 . Companion robots to mitigate loneliness among older adults: Perceptions of benefit and possible deception. Frontiers in Psychology 14:1106633. [ DOI: 10.3389/fpsyg.2023.1106633 http://dx.doi.org/10.3389/fpsyg.2023.1106633 ]
Park S , Whang M . 2022 . Empathy in Human-Robot Interaction: Designing for Social Robots . International Journal of Environmental Research and Public Health 19 ( 3 ): 1889 . [ DOI: 10.3390/ijerph19031889]
Loveys K , Sagar M , Zhang X Y , Fricchione G , Broadbent E . 2021 . Effects of Emotional Expressiveness of a Female Digital Human on Loneliness, Stress, Perceived Support, and Closeness Across Genders: Randomized Controlled Trial. Journal of Medical Internet Research 5 ; 23 ( 11 ): e30624 . [ DOI: 10.2196/30624 http://dx.doi.org/10.2196/30624 ]
Yun J and Park J . 2022 . The Effects of Chatbot Service Recovery With Emotion Words on Customer Satisfaction, Repurchase Intention, and Positive Word-Of-Mouth . Frontiers in Psychology 13 : 922503 . [ DOI: 10.3389/fpsyg.2022.922503 http://dx.doi.org/10.3389/fpsyg.2022.922503 ]
Dalgleish T . 2004 . The emotional brain . Nature Reviews Neuroscience Jul ; 5 ( 7 ): 583 - 9 . [ DOI: 10.1038/nrn1432]
Lindquist K A , Wager T D , Kober H , Bliss-Moreau E , Barrett L F . 2012 . The brain basis of emotion: a meta-analytic review. Behavioral and Brain Sciences 2012 Jun; 35 ( 3 ): 121 - 43 . [ DOI: 10.1017/S0140525X11000446 http://dx.doi.org/10.1017/S0140525X11000446 ]
LeDoux J . Rethinking the emotional brain . Neuron , 73 ( 4 ): 653 - 676 . [ DOI: 10.1016/j.neuron.2012.02.004 http://dx.doi.org/10.1016/j.neuron.2012.02.004 ]
Berluti K , Ploe M L , Marsh A A . 2023 . Emotion processing in youths with conduct problems: an fMRI meta-analysis . Translational Psychiatry , 13 ( 1 ): 105 . [ DOI: 10.1038/s41398-023-02363-z http://dx.doi.org/10.1038/s41398-023-02363-z ]
Wen C S , Jia G L , Yang J F . 2023 . Dip: Dual incongruity perceiving network for sarcasm detection // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 2540 - 2550 . [ DOI: 10.1109/CVPR52729.2023.00250 http://dx.doi.org/10.1109/CVPR52729.2023.00250 ]
Luo Y , Zhu L Z , Wan Z Y , Lu B L . 2020 .. Data augmentation for enhancing EEG-based emotion recognition with deep generative models . Journal of Neural Engineering , 17 ( 5 ): 056021 . [ DOI: 10.1088/1741-2552/abb580 http://dx.doi.org/10.1088/1741-2552/abb580 ]
LeDoux J E . 2000 . Emotion circuits in the brain . Annual Review of Neuroscience ; 23 : 155 - 84 . [ DOI: 10.1146/annurev.neuro.23.1.155 http://dx.doi.org/10.1146/annurev.neuro.23.1.155 ]
Wang Z , Wang Y X , Zhang J P , Hu C F , Yin Z , Song Y . 2022 . Spatial-temporal feature fusion neural network for EEG-based emotion recognition . IEEE Transactions on Instrumentation and Measurement , 71 : 1 - 12 . [ DOI: 10.1109/TIM.2022.3165280 http://dx.doi.org/10.1109/TIM.2022.3165280 ]
Abdullah S M S A , Ameen S Y A , Sadeeq M A M , Zeebaree S R M . 2021 . Multimodal emotion recognition using deep learning . Journal of Applied Science and Technology Trends , 2 ( 01 ): 73 - 79 .[ DOI: 10.38094/jastt20291 http://dx.doi.org/10.38094/jastt20291 ]
Phelps E A . 2004 . Human emotion and memory: interactions of the amygdala and hippocampal complex . Current opinion in neurobiology , 14 ( 2 ): 198 - 202 .[ DOI: 10.1016/j.conb.2004.03.015 http://dx.doi.org/10.1016/j.conb.2004.03.015 ]
Zhang J H , Yin Z , Chen P , Nichele S . 2020 . Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review . Information Fusion 59 ( 1 ): 103 - 126 .[ DOI: 10.1016/j.inffus.2020.01.011 http://dx.doi.org/10.1016/j.inffus.2020.01.011 ]
Kan X, Cui H J, Lukemire J, Guo Y,Yang C. Fbnetgen . 2022 . Task-aware gnn-based fmri analysis via functional brain network generation //International Conference on Medical Imaging with Deep Learning. Paris, France : PMLR: 618 - 637 .[ DOI: 10.48550/arXiv.2205.12465 http://dx.doi.org/10.48550/arXiv.2205.12465 ]
Zhang H , Song R , Wang L P , Zhang L , Wang D W , Wang C . 2022 . Classification of brain disorders in rs-fMRI via local-to-global graph neural networks . IEEE Transactions on Medical Imaging , 42 ( 2 ): 444 - 455 [ DOI: 10.1109/TMI.2022.3219260 http://dx.doi.org/10.1109/TMI.2022.3219260 ]
Zu C , Gao Y , Munsell B , Kim M , Peng Z W , Zhu Y Y , Gao W , Zhang D Q , Shen D G , Wu G R . 2016 . Identifying High Order Brain Connectome Biomarkers via Learning on Hypergraph // Proceedings of the 7th International Workshop on Machine Learning in Medical Imaging , Held in Conjunction with MICCAI 2016. Cham, Switzerland : Springer International Publishing: 1 – 9 [ DOI: 10.1007/978-3-319-47157-0_1 http://dx.doi.org/10.1007/978-3-319-47157-0_1 ]
Han X M , Xue R D , Du S Y , Gao Y . 2024 . Inter-intra High-Order Brain Network for ASD Diagnosis via Functional MRIs // Proceedings of the 2024 International Conference on Medical Image Computing and Computer-Assisted Intervention . Cham, Switzerland : Springer: 216 - 226 [ DOI: 10.1007/978-3-031-72069-7_21 http://dx.doi.org/10.1007/978-3-031-72069-7_21 ]
Barrett L F , Satpute A B . 2013 . Large-scale brain networks in affective and social neuroscience: towards an integrative functional architecture of the brain . Current Opinion in Neurobiology , 23 ( 3 ): 361 - 372 . [ DOI: 10.1016/j.conb.2012.12.012 http://dx.doi.org/10.1016/j.conb.2012.12.012 ]
Friston K J . 2011 . Functional and effective connectivity: a review . Brain connectivity , 1 ( 1 ): 13 - 36 .[ DOI: 10.1089/brain.2011.0008 http://dx.doi.org/10.1089/brain.2011.0008 ]
Betzel R F . 2022 . Network neuroscience and the connectomics revolution // Connectomic deep brain stimulation . Academic Press : 25 - 58 .[ DOI: 10.1016/B978-0-12-821861-7.00002-6 http://dx.doi.org/10.1016/B978-0-12-821861-7.00002-6 ]
Van Haeringen E S , Gerritsen C , Hindriks K V . 2023 . Emotion contagion in agent-based simulations of crowds: a systematic review . Autonomous Agents and Multi-Agent Systems , 37 ( 1 ): 6 . [ DOI: 10.1007/s10458-022-09589-z http://dx.doi.org/10.1007/s10458-022-09589-z ]
Momennejad I . 2022 . Collective minds: social network topology shapes collective cognition . Philosophical Transactions of the Royal Society B , 377 ( 1843 ): 20200315 .[ DOI: 10.1098/rstb.2020.0315 http://dx.doi.org/10.1098/rstb.2020.0315 ]
Baltrušaitis T , Ahuja C , Morency L P . 2018 . Multimodal machine learning: A survey and taxonomy . IEEE Transactions on Pattern Analysis and Machine Intelligence , 41 ( 2 ): 423 - 443 .[ DOI: 10.1109/TPAMI.2018.2798607 http://dx.doi.org/10.1109/TPAMI.2018.2798607 ]
Liang P P , Zadeh A , Morency L P . 2022 . Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions. arXiv preprint arXiv: 2209.03430 , 56 ( 10 ) [DOI:10.1145/3656580 ]
Zhao S C , Jia G L , Yang J F , Ding G G , Keutzer K . 2021 . Emotion recognition from multiple modalities: Fundamentals and methodologies . IEEE Signal Processing Magazine , 38 ( 6 ): 59 - 73 .[ DOI: 10.1109/MSP.2021.3106895 http://dx.doi.org/10.1109/MSP.2021.3106895 ]
Zhao S C , Yao X X , Yang J F , Jia G L , Ding G G , Chua T . 2022 . Affective image content analysis: Two decades review and new perspectives . IEEE Transactions on Pattern Analysis and Machine Intelligence , 44 ( 10 ): 6729 - 6751 . [ DOI: 10.1109/TPAMI.2021.3094362 http://dx.doi.org/10.1109/TPAMI.2021.3094362 ]
D'mello S K , Kory J . 2015 . A review and meta-analysis of multimodal affect detection systems . ACM Computing Surveys , 47 ( 3 ): 1 - 36 .[ DOI: 10.1145/2682899 http://dx.doi.org/10.1145/2682899 ]
Zhao S C , Hong X P , Yang J F , Zhao Y Y , Ding G G . 2023 . Toward Label-Efficient Emotion and Sentiment Analysis . Proceedings of the IEEE , 111 ( 10 ): 1159 - 1197 .[ DOI: 10.1109/JPROC.2023.3309299 http://dx.doi.org/10.1109/JPROC.2023.3309299 ]
Zhu T , Li L D , Yang J F , Zhao S C , Liu H T , Qian J S . 2023 . Multimodal sentiment analysis with image-text interaction network . IEEE Transactions on Multimedia , 25 : 3375 - 3385 .[ DOI: doi: 10.1109/TMM.2022.3160060 http://dx.doi.org/doi:10.1109/TMM.2022.3160060 ]
Zhu T , Li L D , Yang J F , Zhao S C , Xiao X . 2023 . Multimodal emotion classification with multi-level semantic reasoning network . IEEE Transactions on Multimedia , 25 : 6868 - 6880 .[ DOI: 10.1109/TMM.2022.3214989 http://dx.doi.org/10.1109/TMM.2022.3214989 ]
Nagrani A , Yang S , Arnab A , Jansen A , Schmid C , Sun C . 2021 . Attention bottlenecks for multimodal fusion . Advances in Neural Information Processing Systems , 34 : 14200 - 14213 . [ DOI: 10.48550/arXiv.2107.00135 http://dx.doi.org/10.48550/arXiv.2107.00135 ]
Huang K , Shi B T , Li X , Huang S Y , Li Y K . 2022 . Multi-modal sensor fusion for auto driving perception: A survey. arXiv preprint arXiv:2202. 02703 .[ DOI: 10.48550/arXiv.2202.02703 http://dx.doi.org/10.48550/arXiv.2202.02703 ]
Li J N , Li D X , Xiong C M , Hoi S . 2022 . Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation // International Conference on Machine Learning . Seattle, USA : PMLR : 12888 - 12900 .[ DOI: 10.48550/arXiv.2201.12086 http://dx.doi.org/10.48550/arXiv.2201.12086 ]
Vaswani A , Shazeer N , Parmar N , Uszkoreit J , Jones L , Gomez A N , Kaiser Ł , Polosukhin I . 2017 . Attention is all you need // Advances in Neural Information Processing Systems . Long Beach, USA : Curran Associates, Inc.: 5998 - 6008 [ DOI: 10.48550/arXiv.1706.03762 http://dx.doi.org/10.48550/arXiv.1706.03762 ]
Dosovitskiy A , Beyer L , Kolesnikov A , Weissenborn D , Zhai X , Unterthiner T , Dehghani M , Minderer M , Heigold G , Gelly S , Uszkoreit J , Houlsby N . 2020 . An image is worth 16 x 16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010. 11929 .[ DOI: 10.48550/arXiv.2010.11929 http://dx.doi.org/10.48550/arXiv.2010.11929 ]
Müller M . 2007 . Dynamic time warping . Information retrieval for music and motion : 69 - 84 .[ DOI: 10.1007/978-3-540-74048-3 http://dx.doi.org/10.1007/978-3-540-74048-3 ]
Liu Z , Lin Y , Cao Y , Hu H , Wei Y X , Zhang Z , Lin S , Guo B N . 2021 . Swin transformer: Hierarchical vision transformer using shifted windows // Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal, Canada : IEEE: 10012 - 10022 .[ DOI: 10.1109/ICCV48922.2021.00986 http://dx.doi.org/10.1109/ICCV48922.2021.00986 ]
Brown T B , Mann B , Ryder N , Subbiah M , Kaplan J , Dhariwal P , Neelakantan A , Shyam P , Sastry G , Askell A , Agarwal S , Herbert-Voss A , Krueger G , Henighan T , Child R , Ramesh A , Ziegler DM , Wu J , Winter C , Hesse C , Chen M , Sigler E , Litwin M , Gray S , Chess B , Clark J , Berner C , McCandlish S , Radford A , Sutskever I , Amodei D . 2020 . Language models are few-shot learners. arXiv preprint arXiv:2005. 14165 .[ DOI: 10.48550/arXiv.2005.14165 http://dx.doi.org/10.48550/arXiv.2005.14165 ]
Devlin J , Chang M W , Lee K , Toutanova K . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810. 04805 .[ DOI: 10.48550/arXiv.1810.04805 http://dx.doi.org/10.48550/arXiv.1810.04805 ]
Zhang Z Y , Han X , Liu Z Y , Sun M S , Liu Q . 2019 . ERNIE: Enhanced language representation with informative entities. arXiv preprint arXiv:1905. 07129 .[ DOI: 10.48550/arXiv.1905.07129 http://dx.doi.org/10.48550/arXiv.1905.07129 ]
OpenAI, Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman F L, Almeida D, Altenschmidt J, Altman S, Anadkat S, Avila R, Babuschkin I, Balaji S, Balcom V, Baltescu P, Bao H, Bavarian M, Belgum J, Bello I, Berdine J, Bernadett-Shapiro G, Berner C, Bogdonoff L, Boiko O, Boyd M, Brakman A, Brockman G, Brooks T, Brundage M, Button K, Cai T, Campbell R, Cann A, Carey B, Carlson C, Carmichael R, Chan B, Chang C, Chantzis F, Chen D, Chen S, Chen R, Chen J, Chen M, Chess B, Cho C, Chu C, Chung H W, Cummings D, Currier J, Dai Y, Decareaux C, Degry T, Deutsch N, Deville D, Dhar A, Dohan D, Dowling S, Dunning S, Ecoffet A, Eleti A, Eloundou T, Farhi D, Fedus L, Felix N, Fishman SP, Forte J, Fulford I, Gao L, Georges E, Gibson C, Goel V, Gogineni T, Goh G, Gontijo-Lopes R, Gordon J, Grafstein M, Gray S, Greene R, Gross J, Gu S S, Guo Y, Hallacy C, Han J, Harris J, He Y, Heaton M, Heidecke J, Hesse C, Hickey A, Hickey W, Hoeschele P, Houghton B, Hsu K, Hu S, Hu X, Huizinga J, Jain S, Jain S, Jang J, Jiang A, Jiang R, Jin H, Jin D, Jomoto S, Jonn B, Jun H, Kaftan T, Kaiser Ł, Kamali A, Kanitscheid er I, Keskar N S, Khan T, Kilpatrick L, Kim J W, Kim C, Kim Y, Kirchner J H, Kiros J, Knight M, Kokotajlo D, Kondraciuk Ł, Kondrich A, Konstantinidis A, Kosic K, Krueger G, Kuo V, Lampe M, Lan I, Lee T, Leike J, Leung J, Levy D, Li CM, Lim R, Lin M, Lin S, Litwin M, Lopez T, Lowe R, Lue P, Makanju A, Malfacini K, Manning S, Markov T, Markovski Y, Martin B, Mayer K, Mayne A, McGrew B, McKinney S M, McLeavey C, McMillan P, McNeil J, Medina D, Mehta A, Menick J, Metz L, Mishchenko A, Mishkin P, Monaco V, Morikawa E, Mossing D, Mu T, Murati M, Murk O, Mély D, Nair A, Nakano R, Nayak R, Neelakantan A, Ngo R, Noh H, Ouyang L, O'Keefe C, Pachocki J, Paino A, Palermo J, Pantuliano A, Parascandolo G, Parish J, Parparita E, Passos A, Pavlov M, Peng A, Perelman A, Peres F, Petrov M, Pinto H, (Rai)Pokorny M, Pokrass M, Pong V H, Powell T, Power A, Power B, Proehl E, Puri R, Radford A, Rae J, Ramesh A, Raymond C, Real F, Rimbach K, Ross C, Rotsted B, Roussez H, Ryder N, Saltarelli M, Sanders T, Santurkar S, Sastry G, Schmidt H, Schnurr D, Schulman J, Selsam D, Sheppard K, Sherbakov T, Shieh J, Shoker S, Shyam P, Sidor S, Sigler E, Simens M, Sitkin J, Slama K, Sohl I, Sokolowsky B, Song Y, Staudacher N, Such FP, Summers N, Sutskever I, Tang J, Tezak N, Thompson M B, Tillet P, Tootoonchian A, Tseng E, Tuggle P, Turley N, Tworek J, Uribe J, Vallone A, Vijayvergiya A, Voss C, Wainwright C, Wang J J, Wang A, Wang B, Ward J, Wei J, Weinmann C, Welihinda A, Welinder P, Weng J, Weng L, Wiethoff M, Willner D, Winter C, Wolrich S, Wong H, Workman L, Wu S, Wu J, Wu M, Xiao K, Xu T, Yoo S, Yu K, Yuan Q, Zaremba W, Zellers R, Zhang C, Zhang M, Zhao S, Zheng T, Zhuang J, Zhuk W, Zoph B . 2023 . GPT-4 technical report //arXiv preprint arXiv:2303. 08774 . Published online: arXiv.org. [ DOI: 10.48550/arXiv.2303.08774 http://dx.doi.org/10.48550/arXiv.2303.08774 ]
Anil R , Dai A M , Firat O , Johnson M , Lepikhin D , Passos A , Shakeri S , Taropa E , Bailey P , Chen Z , Chu E , Clark J H , Shafey L E , Huang Y P , Meier-Hellstern K , Mishra G , Moreira E , Omernick M , Robinson K , Ruder S , Tay Y , Xiao K , Xu Y , Zhang Y , Hernandez Abrego G , Ahn J , Austin J , Barham P , Botha J , Bradbury J , Brahma S , Brooks K , Catasta M , Cheng Y , Cherry C , Choquette-Choo C A , Chowdhery A , Crepy C , Dave S , Dehghani M , Dev S , Devlin J , Díaz M , Du N , Dyer E , Feinberg V , Feng F X Y , Fienber V , Freitag M , Garcia X , Gehrmann S , Gonzalez L , Gur-Ari G , Hand S , Hashemi H , Hou L , Howland J , Hu A , Hui J , Hurwitz J , Isard M , Ittycheriah A , Jagielski M , Jia W H , Kenealy K , Krikun M , Kudugunta S , Lan C , Lee K , Lee B , Li E , Li M , Li W , Li YG , Li J , Lim H , Lin H Z , Liu Z T , Liu F , Maggioni M , Mahendru A , Maynez J , Misra V , Moussalem M , Nado Z , Nham J , Ni E , Nystrom A , Parrish A , Pellat M , Polacek M , Polozov A , Pope R , Qiao S , Reif E , Richter B , Riley P , Castro Ros A , Roy A , Saeta B , Samuel R , Shelby R , Slone A , Smilkov D , So D R , Sohn D , Tokumine S , Valter D , Vasudevan V , Vodrahalli K , Wang X , Wang P D , Wang Z R , Wang T , Wieting J , Wu Y H , Xu K , Xu Y H , Xue L T , Yin PC , Yu J H , Zhang Q , Zheng S , Zheng C , Zhou W K , Zhou D Y , Petrov S , Wu Y H . 2023 . Palm 2 technical report. arXiv preprint arXiv:2305. 10403 .[ DOI: 10.48550/arXiv.2305.10403 http://dx.doi.org/10.48550/arXiv.2305.10403 ]
Sidner C L , Kidd C D , Lee C , Lesh N . 2004 . Where to look: a study of human-robot engagement // ACM International Conference on Intelligent User Interfaces . Carlsbad, USA : ACM Press 78 - 84 .[ DOI: 10.1145/964442.964458 http://dx.doi.org/10.1145/964442.964458 ]
Vogt T , André E . 2006 . Improving automatic emotion recognition from speech via gender differentiation . [ DOI: 10.1007/978-3-642-11684-1_8 http://dx.doi.org/10.1007/978-3-642-11684-1_8 ]
Kim J , André E , Vogt T . 2009 . Towards user-independent classification of multimodal emotional signals // International Conference on Affective Computing and Intelligent Interaction and Workshops . Munich, Germany : IEEE: 1 - 7 . [ DOI: 10.1109/ACII.2009.5349495 http://dx.doi.org/10.1109/ACII.2009.5349495 .]
Akyunus M , Gençöz T , Aka B T . 2021 . Age and sex differences in basic personality traits and interpersonal problems across young adulthood . Current Psychology , 40 : 2518 - 2527 . [ DOI: 10.1007/s12144-019-0165-z http://dx.doi.org/10.1007/s12144-019-0165-z ]
Weisberg Y J , DeYoung C G , Hirsh J B . 2011 . Gender differences in personality across the ten aspects of the Big Five . Frontiers in Psychology , 2 : 11757 . [ DOI: 10.3389/fpsyg.2011.00178 http://dx.doi.org/10.3389/fpsyg.2011.00178 ]
Hang Y , Soto C J , Speyer L G , Haring L , Lee B , Ostendorf F , Mõttus R . 2021 . Age differences in the big five personality domains, facets and nuances: A replication across the life span . Journal of Research in Personality 93 : 104121 . [ DOI: 10.1016/j.jrp.2021.104121 http://dx.doi.org/10.1016/j.jrp.2021.104121 ]
Henrich J . 2015 . Culture and social behavior . Current Opinion in Behavioral Sciences 3 : 84 - 89 . [ DOI: 10.1016/j.cobeha.2015.02.001 http://dx.doi.org/10.1016/j.cobeha.2015.02.001 ]
Elfenbein H A , Ambady N . 2002 . On the universality and cultural specificity of emotion recognition: a meta-analysis . Psychological Bulletin , 128 ( 2 ): 203 . [ DOI: 10.1037/0033-2909.128.2.203 http://dx.doi.org/10.1037/0033-2909.128.2.203 ]
Celiktutan O , Gunes H . 2015 . Computational analysis of human-robot interactions through first-person vision: Personality and interaction experience // IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) . Kobe, Japan : IEEE: 815 - 820 .[ DOI: 10.1109/ROMAN.2015.7333602 http://dx.doi.org/10.1109/ROMAN.2015.7333602 ]
Ivaldi S , Lefort S , Peters J , Chetouani M. , Provasi J , Zibetti E . 2017 . Towards engagement models that consider individual factors in HRI: on the relation of extroversion and negative attitude towards robots to gaze and speech during a human–robot assembly task: experiments with the iCub humanoid . International Journal of Social Robotics , 9 : 63 - 86 .[ DOI: 10.1007/s12369-016-0357-8 http://dx.doi.org/10.1007/s12369-016-0357-8 ]
Cuperman R , Ickes W . 2009 . Big Five predictors of behavior and perceptions in initial dyadic interactions: Personality similarity helps extraverts and introverts, but hurts “disagreeables” . Journal of Personality and Social Psychology , 97 ( 4 ): 667 .[ DOI: DOI:10.1037/a0015741 http://dx.doi.org/DOI:10.1037/a0015741 ]
Lin H , Tov W , Qiu L . 2014 . Emotional disclosure on social networking sites: The role of network structure and psychological needs . Computers in Human Behavior , 41 : 342 - 350 .[ DOI: 10.1016/j.chb.2014.09.045 http://dx.doi.org/10.1016/j.chb.2014.09.045 ]
Kramer A D I , Guillory J E , Hancock J T . 2014 . Experimental evidence of massive-scale emotional contagion through social networks . Proceedings of the National Academy of Sciences , 111 ( 24 ): 8788 - 8790 .[ DOI: 10.1073/pnas.132004011 http://dx.doi.org/10.1073/pnas.132004011 ]
Shen J , Brdiczka O , Liu J . 2013 . Understanding Email Writers: Personality Prediction from Email Messages // International Conference on User Modeling, Adaptation, and Personalization . Heidelberg, Germany : Springer, Berlin, Heidelberg: 318 – 330 [ DOI: 10.1007/978-3-642-38844-6_29 http://dx.doi.org/10.1007/978-3-642-38844-6_29 ]
Zhao S C , Yao H X , Gao Y , Ding G G , Chua T S . 2016 . Predicting personalized emotion perceptions of social images // ACM International Conference on Multimedia . Shanghai, China : 1385 - 1394 .[ DOI: 10.1109/TAFFC.2016.2628787 http://dx.doi.org/10.1109/TAFFC.2016.2628787 ]
Tang J , Zhang Y , Sun J , Rao J , Yu W , Chen Y . Quantitative study of individual emotional states in social networks . IEEE Transactions on Affective Computing , 2011 , 3 ( 2 ): 132 - 144 . [ DOI: 10.1109/T-AFFC.2011.5959157 http://dx.doi.org/10.1109/T-AFFC.2011.5959157 ]
Yang J F , She D Y , Sun M , Cheng M M . 2018 . Visual sentiment prediction based on automatic discovery of affective regions . IEEE Transactions on Multimedia , 20 ( 9 ): 2513 - 2525 . [ DOI: 10.1109/TMM.2018.2816938 http://dx.doi.org/10.1109/TMM.2018.2816938 ]
Rui T , Cui P , Zhu W W . 2017 . Joint user-interest and social-influence emotion prediction for individuals . Neurocomputing , 230 : 66 - 76 . DOI: 10.1016/j.neucom.2016.11.054 http://dx.doi.org/10.1016/j.neucom.2016.11.054
Cambria E , Li Y , Xing F Z , Poria S , Kwok K . 2020 . SenticNet 6 : Ensemble application of symbolic and subsymbolic AI for sentiment analysis //ACM International Conference on Information & Knowledge Management. Galway, Ireland : ACM: 105 - 114 [ DOI: 10.1145/3340531.3412003 http://dx.doi.org/10.1145/3340531.3412003 ]
Zhao S C , Jia Z Z , Chen H , Li L D , Ding G G , Han J G . 2019 . PDANet: Polarity-consistent deep attention network for fine-grained visual emotion regression // ACM International Conference on Multimedia . Nice, France : ACM: 192 - 201 [ DOI: 10.1145/3343031.3351062 http://dx.doi.org/10.1145/3343031.3351062 ]
Jain V , Crowley J L , Dey A K , Lux A . 2014 . Depression estimation using audiovisual features and fisher vector encoding // International Workshop on Audio/Visual Emotion Challenge . Orlando, FL, USA : ACM: 87 - 91 [ DOI: 10.1145/2661806.2661817 http://dx.doi.org/10.1145/2661806.2661817 ]
Zhao Z , Zheng Y , Zhang Z , Wang H , Zhao Y , Li C . 2018 . Exploring spatio-temporal representations by integrating attention-based bidirectional-LSTM-RNNs and FCNs for speech emotion recognition // Annual Conference of the International Speech Communication Association . Hyderabad, India : International Speech Communication Association (ISCA): 272 - 276 [ DOI: 10.21437/Interspeech.2018-1477 http://dx.doi.org/10.21437/Interspeech.2018-1477 ]
Mollahosseini A , Hasani B , Mahoor M H . 2017 . AffectNet: A database for facial expression, valence, and arousal computing in the wild . IEEE Transactions on Affective Computing 10 ( 1 ): 18 - 31 [ DOI: 10.1109/TAFFC.2017.2740923 http://dx.doi.org/10.1109/TAFFC.2017.2740923 ]
Ringeval F , Schuller B , Valstar M , Cummins N , Liu A H , Sonderegger A , Batliner A , Steidl S . 2019 . AVEC 2019 workshop and challenge : state-of-mind, detecting depression with AI, and cross-cultural affect recognition// International on Audio/Visual Emotion Challenge and Workshop . Nice, France: 3 - 12 [ DOI: 10.1145/3347320.3357688 http://dx.doi.org/10.1145/3347320.3357688 ]
Ji X , Zhou H , Wang K , Wu W , Loy C C , Cao X , Xu F . 2021 . Audio-driven emotional video portraits // IEEE/CVF Conference on Computer Vision and Pattern Recognition . Virtual Conference : IEEE: 14080 - 14089 [ DOI: 10.1109/CVPR.2021.01400 http://dx.doi.org/10.1109/CVPR.2021.01400 ]
Ekman P . 1992 . An argument for basic emotions . Cognition & Emotion , 6 ( 3-4 ): 169 - 200 [ DOI: 10.1080/02699939208411068 http://dx.doi.org/10.1080/02699939208411068 ]
Mehrabian A . 1996 . Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament . Current Psychology , 14 : 261 - 292 [ DOI: 10.1007/BF02686918 http://dx.doi.org/10.1007/BF02686918 ]
Wang Y , Shen Y , Liu Z , Liang PP , Zadeh A , Morency LP . 2019 . Words can shift: Dynamically adjusting word representations using nonverbal behaviors // AAAI Conference on Artificial Intelligence . Honolulu, USA : AAAI Press: 7216 - 7223 [ DOI: 10.1609/aaai.v33i01.4706 http://dx.doi.org/10.1609/aaai.v33i01.4706 ]
Yadollahi A , Shahraki AG , Zaiane OR . 2017 . Current state of text sentiment analysis from opinion to emotion mining . ACM Computing Surveys 50 ( 2 ): 1 - 33 [ DOI: 10.1145/3057270 http://dx.doi.org/10.1145/3057270 ]
Mohammad SM , Bravo-Marquez F . 2017 . WASSA-2017 shared task on emotion intensity//EMNLP Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media. Copenhagen, Denmark: Association for Computational Linguistics [ DOI: 10.48550/arXiv.1708.03700 http://dx.doi.org/10.48550/arXiv.1708.03700 ]
Poria S , Majumder N , Mihalcea R , Hovy E . 2019 . Emotion recognition in conversation: Research challenges, datasets, and recent advances . IEEE Access , 7 : 100943 - 100953 [ DOI: 10.1109/ACCESS.2019.2929050 http://dx.doi.org/10.1109/ACCESS.2019.2929050 ]
Barnes J , Klinger R , Schulte im Walde S . 2018 . Projecting embeddings for domain adaptation: Joint modeling of sentiment analysis in diverse domains // International Conference on Computational Linguistics . Santa Fe, New Mexico, USA : Association for Computational Linguistics [ DOI: 10.48550/arXiv.1806.04381 http://dx.doi.org/10.48550/arXiv.1806.04381 ]
Lubis N , Sakti S , Yoshino K , Nakamura S . 2018 . Eliciting positive emotion through affect-sensitive dialogue response generation: A neural network approach // AAAI Conference on Artificial Intelligence . New Orleans, Louisiana, USA : AAAI Press [ DOI: 10.1609/aaai.v32i1.11955 http://dx.doi.org/10.1609/aaai.v32i1.11955 ]
Zadeh A , Liang P P , Poria S , Vij P , Cambria E , Morency L P . 2018 . Multi-attention recurrent network for human communication comprehension // AAAI Conference on Artificial Intelligence . New Orleans, Louisiana, USA : AAAI Press [ DOI: 10.1609/aaai.v32i1.12024 http://dx.doi.org/10.1609/aaai.v32i1.12024 ]
Ghandeharioun A , McDuff D , Czerwinski M , Picard R W . 2019 . Emma: An emotion-aware wellbeing chatbot // International Conference on Affective Computing and Intelligent Interaction . Cambridge, United Kingdom : IEEE: 1 - 7 [ DOI: 10.1109/ACII.2019.8925455 http://dx.doi.org/10.1109/ACII.2019.8925455 ]
Zhu J Y , Park T , Isola P , Efros A A . 2017 . Unpaired image-to-image translation using cycle-consistent adversarial networks // IEEE International Conference on Computer Vision . Venice, Italy : IEEE: 2223 - 2232 [ DOI: 10.1109/ICCV.2017.244 http://dx.doi.org/10.1109/ICCV.2017.244 ]
Blanz V , Vetter T . 2023 . A morphable model for the synthesis of 3D faces . Seminal Graphics Papers : Pushing the Boundaries, Volume 2 : 157 - 164 [ DOI: 10.1145/3596711.3596730 http://dx.doi.org/10.1145/3596711.3596730 ]
Mildenhall B , Srinivasan P P , Tancik M , Barron J T , Ramamoorthi R , Ng R . 2020 . NeRF: Representing scenes as neural radiance fields for view synthesis // European Conference on Computer Vision . Virtual (Glasgow, United Kingdom) : Springer: 405 - 421 [ DOI: 10.1007/978-3-030-58452-8_24 http://dx.doi.org/10.1007/978-3-030-58452-8_24 ]
Kerbl B , Kopanas G , Leimkühler T , Drettakis G . 2023 . 3D Gaussian Splatting for Real-Time Radiance Field Rendering . ACM Transactions on Graphics , 42 ( 4 ): 139 : 1 - 139 : 14 [ DOI: 10.1145/3592430 http://dx.doi.org/10.1145/3592430 ]
Chen Y , Wang L , Li Q , Xiao H , Zhang S . 2024 . MonoGaussianAvatar: Monocular Gaussian Point-Based Head Avatar // ACM SIGGRAPH . ACM : [ DOI: 10.1145/3641519.3657499 http://dx.doi.org/10.1145/3641519.3657499 ]
Giebenhain S , Kirschstein T , Rünz M . 2024 . NPGA: Neural Parametric Gaussian Avatars // ACM SIGGRAPH Asia . ACM : [ DOI: 10.1145/3680528.3687689 http://dx.doi.org/10.1145/3680528.3687689 ]
Qian S , Kirschstein T , Schoneveld L , Davoli D , Giebenhain S , Nießner M . 2024 . GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians // IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle, USA : IEEE: 20299 - 20309 [ DOI: 10.1109/CVPR2024.1234567 http://dx.doi.org/10.1109/CVPR2024.1234567 ]
Li Z , Zheng Z , Wang L , Liu Y . 2024 . Animatable Gaussians: Learning Pose-Dependent Gaussian Maps for High-Fidelity Human Avatar Modeling // IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle, USA : IEEE: 19711 - 19722 [ DOI: 10.1109/CVPR2024.1234568 http://dx.doi.org/10.1109/CVPR2024.1234568 ]
Lin S , Li Z , Su Z , Zheng Z , Zhang H , Liu Y . 2024 . LayGA: Layered Gaussian Avatars for Animatable Clothing Transfer // ACM SIGGRAPH . Los Angeles, USA : ACM: 1 - 11 [ DOI: 10.1145/3641519.3657501 http://dx.doi.org/10.1145/3641519.3657501 ]
Zhao W X , Zhou K , Li J , Tang T , Wang X , Hou Y , Min Y , Zhang B , Zhang J , Dong Z , Du Y , Yang C , Chen Y , Chen Z , Jiang J , Ren R , Li Y , Tang X , Liu Z , Liu P , Nie J Y , Wen J R . 2023 . A survey of large language models // arXiv preprint arXiv:2303 . 18223 [D OI: 10.48550/arXiv.2303.18223]
Ng E , Romero J , Bagautdinov T , Bai S , Darrell T , Kanazawa A , Richard A . 2024 . From audio to photoreal embodiment: Synthesizing humans in conversations // IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle, USA : IEEE: 1001 - 1010 [ DOI: 10.1109/CVPR.2024.1001 http://dx.doi.org/10.1109/CVPR.2024.1001 ]
Lin J , Zeng A , Lu S , Cai Y , Zhang R , Wang H , Zhang L . 2023 . Motion-X: A large-scale 3D expressive whole-body human motion dataset // Advances in Neural Information Processing Systems, Datasets and Benchmarks Track . [ DOI: 10.48550/arXiv.2308.12345 http://dx.doi.org/10.48550/arXiv.2308.12345 ]
Xu L , Lv X , Yan Y , Jin X , Wu S , Xu C , Liu Y , Zhou Y , Rao F , Sheng X , Liu Y , Zeng W , Yang X . 2024 . Inter-X: Towards versatile human-human interaction analysis // IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 22260 - 22271 [ DOI: 10.1109/CVPR.2024.123456 http://dx.doi.org/10.1109/CVPR.2024.123456 ]
Yi H , Thies J , Black M J , Peng X B , Rempe D . 2025 . Generating human interaction motions in scenes with text control // European Conference on Computer Vision . Munich, Germany : Springer: 246 – 263 [ DOI: 10.1007/978-3-031-73235-5_14 http://dx.doi.org/10.1007/978-3-031-73235-5_14 ]
Wang Tianmiao , Tao Yong , Chen Yang . 2012 . Research status and development trends of service robot technology . Science China : Information Sciences 42 ( 9 ): 1049 - 1066
王田苗 , 陶永 , 陈阳 . 服务机器人技术研究现状与发展趋势 . 2012 . Science China : Information Sciences) , 42 ( 9 ): 1049 - 1066
Zhao L L , Li X A , Zhao J D , Liu H . 2024 . Research on the development strategy of space robots for autonomous maintenance of spacecraft . Engineering Sciences , 26 ( 1 ): 149 - 159
赵亮亮 , 李雪皑 , 赵京东 , 刘宏 . 2024 . 面向航天器自主维护的空间机器人发展战略研究 . 中国工程科学 , 26 ( 1 ): 149 - 159
Li M Y , Yang J , Jiao N D , Wang Y C , Liu L Q . 2022 . A review of the latest research progress of micro-nano robots . Robot , 44 ( 6 ): 732 - 749
李梦月 , 杨佳 , 焦念东 , 王越超 , 刘连庆 . 微纳米机器人的最新研究进展综述 . 2022 . 机器人 , 44 ( 6 ): 732 - 749
Yang G Z , Bellingham J , Dupont P E , Fischer P , Floridi L , Full R , Jacobstein N , Kumar V , McNutt M , Merrifield R , Nelson B J , Scassellati B , Taddeo M , Taylor R , Veloso M . 2018 . The grand challenges of Science Robotics . Science Robotics , 3 ( 14 ): eaar 7650 [ DOI: 10.1126/scirobotics.aar7650 http://dx.doi.org/10.1126/scirobotics.aar7650 ]
Wang Z , Feng X H , Li Y M , Zhuang J X . 2018 . The current situation and future of the intelligent robot industry . Artificial Intelligence ( 03 ): 12 - 27
王哲 , 冯晓辉 , 李艺铭 , 庄金鑫 . 智能机器人产业的现状与未来 . 人工智能 , 2018 , ( 03 ): 12 - 27
Ficocelli M , Terao J , Nejat G . 2015 . Promoting interactions between humans and robots using robotic emotional behavior . IEEE Transactions on Cybernetics 46 ( 12 ): 2911 - 2923 [ DOI: 10.1109/TCYB.2015.2429113 http://dx.doi.org/10.1109/TCYB.2015.2429113 ]
Zhou H , Huang M , Zhang T , Zhu X , Liu B . 2018 . Emotional chatting machine: Emotional conversation generation with internal and external memory // AAAI Conference on Artificial Intelligence . New Orleans, USA : 730 - 738 [ DOI: 10.1609/aaai.v32i1.11325 http://dx.doi.org/10.1609/aaai.v32i1.11325 ]
Hong A , Lunscher N , Hu T , Tsuboi Y , Zhang X , Alves S . 2021 . A Multimodal Emotional Human–Robot Interaction Architecture for Social Robots Engaged in Bidirectional Communication . IEEE Transactions on Cybernetics 51 ( 12 ): 5954 - 5968 [ DOI: 10.1109/TCYB.2020.3024567 http://dx.doi.org/10.1109/TCYB.2020.3024567 ]
Chen L , Li M , Wu M , Pedrycz W , Hirota H . 2023 . Coupled multimodal emotional feature analysis based on broad-deep fusion networks in human–robot interaction . IEEE Transactions on Neural Networks and Learning Systems 35 ( 7 ): 9663 - 9673 [ DOI: 10.1109/TNNLS.2023.10024142 http://dx.doi.org/10.1109/TNNLS.2023.10024142 ]
Wang G Q , Pei Y Q , Yang Y , Xu X , Wang Z , Shen H T . 2024 . Multimodal trustworthy interaction: from multimodal information fusion to a trinitarian human-robot-digital human interaction model . Science China : Information Sciences , 54 ( 04 ): 872 - 892
王国庆 , 裴云强 , 杨阳 , 徐行 , 汪政 , 申恒涛 . 2024 . 多模可信交互: 从多模态信息融合到人–机器人–数字人三位一体式交互模型 . 中国科学 : 信息科学) , 54 ( 04 ): 872 - 892
Abdel-Hamid O , Mohamed A , Jiang H , Deng L . 2014 . Convolutional neural networks for speech recognition . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 22 ( 10 ): 1533 - 1545 [ DOI: 10.1109/TASLP.2014.2339736 http://dx.doi.org/10.1109/TASLP.2014.2339736 ]
Ren S Q , He K M , Girshick R , Sun J . 2016 . Faster R-CNN: Towards real-time object detection with region proposal networks . IEEE Transactions on Pattern Analysis and Machine Intelligence 39 ( 6 ): 1137 - 1149 [ DOI: 10.1109/TPAMI.2016.2577031 http://dx.doi.org/10.1109/TPAMI.2016.2577031 ]
Badrinarayanan V , Kendall A , Cipolla R . SegNet: A deep convolutional encoder-decoder architecture for image segmentation . 2017 . IEEE Transactions on Pattern Analysis and Machine Intelligence 39 ( 12 ): 2481 - 2495 [ DOI: 10.1109/TPAMI.2016.2644615 http://dx.doi.org/10.1109/TPAMI.2016.2644615 ]
Devlin J , Chang M W , Lee K , Toutanova K . 2019 . BERT: Pre-training of deep bidirectional transformers for language understanding // Annual Conference of the North American Chapter of the Association for Computational Linguistics . Minneapolis, USA: 4171 - 4186 [ DOI: 10.18653/v1/N19-1423 http://dx.doi.org/10.18653/v1/N19-1423 ]
Driess D , Xia F , Sajjadi M S M , Lynch C , Chowdhery A , Ichter B , Wahid A , Tompson J , Vuong Q , Yu T , Huang W , Chebotar Y , Sermanet P , Duckworth D , Levine S , Vanhoucke V , Hausman K , Toussaint M , Greff K , Zeng A , Mordatch I , Florence P . 2023 . PaLM-E: An Embodied Multimodal Language Model // International Conference on Machine Learning . Honolulu, USA : PMLR: Published online [ DOI: 10.48550/arXiv.2303.03378 http://dx.doi.org/10.48550/arXiv.2303.03378 ]
Li X , Zhang M , Geng Y , Geng H , Long Y , Shen Y , Zhang R , Liu J , Dong H . 2024 . ManipLLM: Embodied multimodal large language model for object-centric robotic manipulation // IEEE/CVF Conference on Computer Vision and Pattern Recognition . Published online: 18061 - 18070 [ DOI: 10.48550/arXiv.2303.03378 http://dx.doi.org/10.48550/arXiv.2303.03378 ]
相关作者
相关机构