Attention-guided three-stream convolutional neural network for microexpression recognition

Zhao Minghua; Dong Shuangshuang; Hu Jing; Du Shuangli; Shi Cheng; Li Peng; Shi Zhenghao

doi:10.11834/jig.230053

Image Analysis and Recognition | Views : 0 下载量: 3 CSCD: 0

PDF
Export
Share
Collection
Album

Attention-guided three-stream convolutional neural network for microexpression recognition
Vol. 29, Issue 1, Pages: 111-122(2024)
Published： 16 January 2024 ，
DOI： 10.11834/jig.230053
稿件说明：

移动端阅览

赵明华，董爽爽，胡静，都双丽，石程，李鹏，石争浩. 2024. 注意力引导的三流卷积神经网络用于微表情识别. 中国图象图形学报， 29(01):0111-0122

Zhao Minghua， Dong Shuangshuang， Hu Jing， Du Shuangli， Shi Cheng， Li Peng， Shi Zhenghao. 2024. Attention-guided three-stream convolutional neural network for microexpression recognition. Journal of Image and Graphics， 29(01):0111-0122
赵明华，董爽爽，胡静，都双丽，石程，李鹏，石争浩. 2024. 注意力引导的三流卷积神经网络用于微表情识别. 中国图象图形学报， 29(01):0111-0122 DOI： 10.11834/jig.230053.

Zhao Minghua， Dong Shuangshuang， Hu Jing， Du Shuangli， Shi Cheng， Li Peng， Shi Zhenghao. 2024. Attention-guided three-stream convolutional neural network for microexpression recognition. Journal of Image and Graphics， 29(01):0111-0122 DOI： 10.11834/jig.230053.

摘要

目的

微表情识别在心理咨询、置信测谎和意图分析等多个领域都有着重要的应用价值。然而，由于微表情自身具有动作幅度小、持续时间短的特点，到目前为止，微表情的识别性能仍然有很大的提升空间。为了进一步推动微表情识别的发展，提出了一种注意力引导的三流卷积神经网络（attention-guided three-stream convolutional neural network， ATSCNN）用于微表情识别。

方法

首先，对所有微表情序列的起始帧和峰值帧进行预处理；然后，利用TV-L1（total variation-L1）能量泛函提取微表情两帧之间的光流；接下来，在特征提取阶段，为了克服有限样本量带来的过拟合问题，通过3个相同的浅层卷积神经网络分别提取输入3个光流值的特征，再引入卷积块注意力模块以聚焦重要信息并抑制不相关信息，提高微表情的识别性能；最后，将提取到的特征送入全连接层分类。此外，整个模型架构采用SELU（scaled exponential linear unit）激活函数以加快收敛速度。

结果

本文在微表情组合数据集上进行LOSO（leave-one-subject-out）交叉验证，未加权平均召回率（unweighted average recall， UAR）以及未加权F1-Score（unweighted F1-score， UF1）分别达到了0.735 1和0.720 5。与对比方法中性能最优的Dual-Inception模型相比，UAR和UF1分别提高了0.060 7和0.068 3。实验结果证实了本文方法的可行性。

结论

本文方法所提出的微表情识别网络，在有效缓解过拟合的同时，也能在小规模的微表情数据集上达到先进的识别效果。

Abstract

Objective

In recent years， microexpression recognition has remarkable application value in various fields such as psychological counseling， lie detection， and intention analysis. However， unlike macro-expressions generated in conscious states， microexpressions often occur in high-risk scenarios and are generated in an unconscious state. They are characterized by small action amplitudes， short duration， and usually affect local facial areas. These features also determine the difficulty of microexpression recognition. Traditional methods used in early research mainly include methods based on local binary patterns and methods based on optical flow. The former can effectively extract the texture features of microexpressions， whereas the latter calculates the pixel changes in the temporal domain and the relationship between adjacent frames， providing rich， key input information for the network. Although the traditional methods based on texture features and optical flow features have made good progress in early microexpression recognition， they often require considerable cost and have room for improvement in recognition accuracy and robustness. Later， with the development of machine learning， microexpression recognition based on deep learning gradually became the mainstream of research in this field. This method uses neural networks to extract features from input image sequences after a series of preprocessing operations （facial cropping and alignment and grayscale processing） and classifies them to obtain the final recognition result. The introduction of deep learning has substantially improved the recognition performance of microexpressions. However， given the characteristics of microexpressions themselves， the recognition accuracy of microexpressions can still be improved considerably， while the limited scale of existing microexpression datasets also restricts the recognition effect of such emotional behaviors. To solve these problems， this paper proposes an attention-guided three-stream convolutional neural network （ATSCNN） for microexpression recognition.

Method

First， considering that the motion changes between adjacent frames of microexpressions are very subtle， to reduce redundant information and computation in the image sequence while preserving the important features of microexpressions， this paper only performs preprocessing operations such as facial alignment and cropping on the two key frames of microexpressions （onset frame and apex frame） to obtain a single-channel grayscale image sequence with a resolution of 128 × 128 pixels and to reduce the influence of nonfacial areas on microexpression recognition. Then， because optical flow can capture representative motion features between two frames of microexpressions， it can obtain a higher signal-to-noise ratio than the original data and provide rich， critical input features for the network. Therefore， this paper uses the total variation-L1 （TV-L1） energy functional to extract optical flow features between two frames of microexpressions （the horizontal component of optical flow， the vertical component of optical flow， and the optical strain）. Next， in the microexpression feature extraction stage， to overcome the overfitting problem caused by limited sample size， three identical four-layer convolutional neural networks are used to extract the features of the input optical flow horizontal component， optical flow vertical component， and optical strain，（the input channel numbers of the four convolutional layers are 1， 3， 5， and 8， and the output channel numbers are 3， 5， 8， and 16）， thus improving the network performance. Afterward， because the image sequences in the microexpression dataset used in this paper inevitably contain some redundant information other than the face， a convolutional block attention module（CBAM） with channel attention and spatial attention serially connected is added after each shallow convolutional neural network in each stream to focus on the important information of the input and suppress irrelevant information， while paying attention to both the channel dimension and the spatial dimension， thereby enhancing the network’s ability to obtain effective features and improving the recognition performance of microexpressions. Finally， the extracted features are fed into a fully connected layer to achieve microexpression emotion classification （including negative， positive， and surprise）. In addition， the entire model architecture uses the scaled exponential linear unit （SELU） activation function to overcome the potential problems of neuron death and gradient disappearance in the commonly used rectified linear unit （ReLU） activation function to speed up the convergence speed of the neural network.

Result

This paper conducted experiments on the microexpression combination dataset using the leave-one-subject-out （LOSO） cross-validation strategy. In this strategy， each subject serves as the test set， and all remaining samples are used for training. This validation method can fully utilize the samples and has a certain generalization ability. This method is the most commonly used validation in current microexpression recognition research. The results of this paper’s experiments on the unweighted average recall （UAR） and unweighted F1-score （UF1） reached 0.735 1 and 0.720 5， respectively. Compared with the Dual-Inception model， which performed best in the comparative methods， UAR and UF1 increased by 0.060 7 and 0.068 3， respectively. To verify further the effectiveness of the ATSCNN neural network architecture proposed in this paper， several ablation experiments were also conducted on the combined dataset， and the results confirmed the feasibility of this paper’s method.

Conclusion

The microexpression recognition network proposed in this paper can effectively alleviate overfitting， focus on important information of microexpressions， and achieve state-of-the-art（SOTA） recognition performance on small-scale microexpression datasets using LOSO cross-validation. Compared with other mainstream models， the proposed method achieved state-of-the-art recognition performance. In addition， the results of several ablation experiments made the proposed method more convincing. In conclusion， the proposed method remarkably improved the effectiveness of microexpression recognition.

关键词

微表情识别光流三流卷积神经网络卷积块注意力模块（CBAM）SELU激活函数

Keywords

microexpression recognitionoptical flowthree-stream convolution neural networkconvolutional block attention module（CBAM）SELU activation function

references

Chaudhry R， Ravichandran A， Hager G and Vidal R. 2009. Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami， USA： IEEE： 1932-1939 ［DOI： 10.1109/CVPR.2009.5206821http://dx.doi.org/10.1109/CVPR.2009.5206821］

Chen B Y， Zhang Z H， Liu N， Tan Y， Liu X Y and Chen T. 2020. Spatiotemporal convolutional neural network with convolutional block attention module for micro-expression recognition. Information， 11（8）： #380 ［DOI： 10.3390/info11080380http://dx.doi.org/10.3390/info11080380］

Davison A K， Lansley C， Costen N， Tan K and Yap M H. 2018. SAMM： a spontaneous micro-facial movement dataset. IEEE Transactions on Affective Computing， 9（1）： 116-129 ［DOI： 10.1109/TAFFC.2016.2573832http://dx.doi.org/10.1109/TAFFC.2016.2573832］

Frank M， Herbasz M， Sinuk K， Keller A and Nolan C. 2009. I see how you feel： training laypeople and professionals to recognize fleeting emotions//Proceedings of 2009 Annual Meeting of the International Communication Association. Sheraton， USA：［s.n.］： 1-35

Gan Y S， Liong S T， Yau W C， Huang Y C and Tan L K. 2019. Off-ApexNet on micro-expression recognition system. Signal Processing： Image Communication， 74： 129-139 ［DOI： 10.1016/j.image.2019.02.005http://dx.doi.org/10.1016/j.image.2019.02.005］

Haggard E A and Isaacs K S. 1966. Micromomentary facial expressions as indicators of ego mechanisms in psychotherapy//Gottschalk L A， Auerbach A H， eds. Methods of Research in Psychotherapy. New York， USA： Springer： 154-165 ［DOI： 10.1007/978-1-4684-6045-2_14http://dx.doi.org/10.1007/978-1-4684-6045-2_14］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Khor H Q， See J， Phan R C W and Lin W Y. 2018. Enriched long-term recurrent convolutional network for facial micro-expression recognition//Proceedings of the 13th IEEE International Conference on Automatic Face and Gesture Recognition. Xi’an， China： IEEE： 667-674 ［DOI： 10.1109/FG.2018.00105http://dx.doi.org/10.1109/FG.2018.00105］

Klambauer G， Unterthiner T， Mayr A and Hochreiter S. 2017. Self-normalizing neural networks//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 972-981

Lai Z， Chen R， Jia J and Qian Y. 2020. Real-time micro-expression recognition based on resnet and atrous convolutions. Journal of Ambient Intelligence and Humanized Computing， 1-12 ［DOI： 10.1007/s12652-020-01779-5http://dx.doi.org/10.1007/s12652-020-01779-5］

Li J T， Dong Z Z， Lu S Y， Wang S J， Yan W J， Ma Y H， Liu Y， Huang C B and Fu X L. 2023. CAS（ME）3： a third generation facial spontaneous micro-expression database with depth information and high ecological validity. IEEE Transactions on Pattern Analysis and Machine Intelligence， 45（3）： 2782-2800 ［DOI： 10.1109/TPAMI.2022.3174895http://dx.doi.org/10.1109/TPAMI.2022.3174895］

Li X B， Pfister T， Huang X H， Zhao G Y and Pietikäinen M. 2013. A spontaneous micro-expression database： inducement， collection and baseline//Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. Shanghai， China： IEEE： 1-6 ［DOI： 10.1109/FG.2013.6553717http://dx.doi.org/10.1109/FG.2013.6553717］

Liong S T， Gan Y S， See J， Khor H Q and Huang Y C. 2019. Shallow triple stream three-dimensional CNN （STSTNet） for Micro-expression Recognition//Proceedings of the 14th IEEE International Conference on Automatic Face and Gesture Recognition （FG 2019）. Lille， France： IEEE： 1-5 ［DOI： 10.1109/FG.2019.8756567http://dx.doi.org/10.1109/FG.2019.8756567］

Liong S T， See J， Wong K S and Phan R C W. 2018. Less is more： micro-expression recognition from video using apex frame. Signal Processing Image Communication， 62： 82-92 ［DOI： 10.1016/j.image.2017.11.006http://dx.doi.org/10.1016/j.image.2017.11.006］

Liu Y C， Du H M， Zheng L and Gedeon T. 2019. A neural micro-expression recognizer//Proceedings of the 14th IEEE International Conference on Automatic Face and Gesture Recognition （FG 2019）. Lille， France： IEEE： 1-4 ［DOI： 10.1109/FG.2019.8756583http://dx.doi.org/10.1109/FG.2019.8756583］

Lu S Y， Li J T， Wang Y， Dong Z Z， Wang S J and Fu X L. 2022. A more objective quantification of micro-expression intensity through facial electromyography//Proceedings of the 2nd Workshop on Facial Micro-Expression： Advanced Techniques for Multi-Modal Facial Expression Analysis. Lisboa， Portugal： ACM： 11-17 ［DOI： 10.1145/3552465.3555038http://dx.doi.org/10.1145/3552465.3555038］

Mehrabian A. 1965. Communication without words. The Lancet， 286（7401）： #30 ［DOI： 10.1016/S0140-6736（65）90194-7http://dx.doi.org/10.1016/S0140-6736（65）90194-7］

Pérez J S， Meinhardt-Llopis E and Facciolo G. 2013. TV-L1 optical flow estimation. Image Processing on Line， 3： 137-150 ［DOI： 10.5201/ipol.2013.26http://dx.doi.org/10.5201/ipol.2013.26］

Pfister T， Li X B， Zhao G Y and Pietikäinen M. 2011. Recognising spontaneous facial micro-expressions//Proceedings of 2011 International Conference on Computer Vision. Barcelona， Spain： IEEE： 1449-1456 ［DOI： 10.1109/ICCV.20116126401http://dx.doi.org/10.1109/ICCV.20116126401］

Polikovsky S， Kameda Y and Ohta Y. 2009. Facial micro-expressions recognition using high speed camera and 3D-gradient descriptor//Proceedings of the 3rd International Conference on Imaging for Crime Detection and Prevention （ICDP 2009）. London， UK： IET： 1-6 ［DOI： 10.1049/ic.2009.0244http://dx.doi.org/10.1049/ic.2009.0244］

Porter S and Ten Brinke L. 2008. Reading between the lies： identifying concealed and falsified emotions in universal facial expressions. Psychological Science， 19（5）： 508-514 ［DOI： 10.1111/j.1467-9280.2008.02116.xhttp://dx.doi.org/10.1111/j.1467-9280.2008.02116.x］

Qu F B， Wang S J， Yan W J， Li H， Wu S H and Fu X L. 2018. CAS（ME）2： a database for spontaneous macro-expression and micro-expression spotting and recognition. IEEE Transactions on Affective Computing， 9（4）： 424-436 ［DOI： 10.1109/TAFFC.2017.2654440http://dx.doi.org/10.1109/TAFFC.2017.2654440］

Scardapane S， van Vaerenbergh S， Hussain A and Uncini A. 2020. Complex-valued neural networks with nonparametric activation functions. IEEE Transactions on Emerging Topics in Computational Intelligence， 4（2）： 140-150 ［DOI： 10.1109/TETCI.2018.2872600http://dx.doi.org/10.1109/TETCI.2018.2872600］

See J， Yap M H， Li J T， Hong X P and Wang S J. 2019. MEGC 2019-the second facial micro-expressions grand challenge//Proceedings of the 14th IEEE International Conference on Automatic Face and Gesture Recognition （FG 2019）. Lille， France： IEEE： 1-5 ［DOI： 10.1109/FG.2019.8756611http://dx.doi.org/10.1109/FG.2019.8756611］

Tang H， Zhu L J， Fan S and Liu H M. 2022. Micro-expression recognition based on optical flow method and pseudo three-dimensional residual network. Journal of Signal Processing， 38（5）： 1075-1087

唐宏，朱龙娇，范森，刘红梅. 2022. 基于光流法与伪3维残差网络的微表情识别. 信号处理， 38（5）： 1075-1087 ［DOI： 10.16798/j.issn.1003-0530.2022.05.020http://dx.doi.org/10.16798/j.issn.1003-0530.2022.05.020］

van Quang N， Chun J and Tokuyama T. 2019. Capsulenet for micro-expression recognition//Proceedings of the 14th IEEE International Conference on Automatic Face and Gesture Recognition （FG 2019）. Lille， France： IEEE： 1-7 ［DOI： 10.1109/FG.2019.8756544http://dx.doi.org/10.1109/FG.2019.8756544］

Wang C Y， Peng M， Bi T and Chen T. 2020. Micro-attention for micro-expression recognition. Neurocomputing， 410： 354-362 ［DOI： 10.1016/j.neucom.2020.06.005http://dx.doi.org/10.1016/j.neucom.2020.06.005］

Woo S， Park J， Lee J Y and Kweon I S. 2018. CBAM： convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 3-19 ［DOI： 10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1］

Xu F and Zhang J P. 2017. Facial microexpression recognition： a survey. Acta Automatica Sinica， 43（3）： 333-348

徐峰，张军平. 2017. 人脸微表情识别综述. 自动化学报， 43（3）： 333-348 ［DOI： 10.16383/j.aas.2017.c160398http://dx.doi.org/10.16383/j.aas.2017.c160398］

Yan W J， Li X B， Wang S J， Zhao G Y， Liu Y J， Chen Y H and Fu X L. 2014. CASME II： an improved spontaneous micro-expression database and the baseline evaluation. PLoS One， 9（1）： #e86041 ［DOI： 10.1371/journal.pone.0086041http://dx.doi.org/10.1371/journal.pone.0086041］

Yan W J， Wu Q， Liu Y J， Wang S J and Fu X L. 2013. CASME database： a dataset of spontaneous micro-expressions collected from neutralized faces//Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. Shanghai， China： IEEE： 1-7 ［DOI： 10.1109/FG.2013.6553799http://dx.doi.org/10.1109/FG.2013.6553799］

Yap M H， See J， Hong X P and Wang S J. 2018. Facial micro-expressions grand challenge 2018 summary//Proceedings of the 13th IEEE International Conference on Automatic Face and Gesture Recognition （FG 2018）. Xi’an， China： IEEE： 675-678 ［DOI： 10.1109/FG.2018.00106http://dx.doi.org/10.1109/FG.2018.00106］

Yarotsky D. 2017. Error bounds for approximations with deep ReLU networks. Neural Networks， 94： 103-114 ［DOI： 10.1016/j.neunet.2017.07.002http://dx.doi.org/10.1016/j.neunet.2017.07.002］

Zhang J H， Liu F and Qi J Y. 2022. Lightweight micro-expression recognition architecture based on Bottleneck Transformer. Computer Science， 49（S1）： 370-377

张嘉淏，刘峰，齐佳音. 2022. 一种基于Bottleneck Transformer的轻量级微表情识别架构. 计算机科学， 49（S1）： 370-377 ［DOI： 10.11896/jsjkx.210500023http://dx.doi.org/10.11896/jsjkx.210500023］

Zhang J H， Liu F and Zhou A M. 2021. Off-TANet： a lightweight neural micro-expression recognizer with optical flow features and integrated attention mechanism//Proceedings of the 18th Pacific Rim International Conference on Artificial Intelligence. Hanoi， Vietnam： Springer： 266-279 ［DOI： 10.1007/978-3-030-89188-6_20http://dx.doi.org/10.1007/978-3-030-89188-6_20］

Zhang M， Fu Q F， Chen Y H and Fu X L. 2014. Emotional context influences micro-expression recognition. PLoS One， 9（4）： #e95018 ［DOI： 10.1371/journal.pone.0095018http://dx.doi.org/10.1371/journal.pone.0095018］

Zhou L， Mao Q R and Xue L Y. 2019. Dual-inception network for cross-database micro-expression recognition//Proceedings of the 14th IEEE International Conference on Automatic Face and Gesture Recognition （FG 2019）. Lille， France： IEEE： 1-5 ［DOI： 10.1109/FG.2019.8756579http://dx.doi.org/10.1109/FG.2019.8756579］

Alert me when the article has been cited

提交

Robust optical flow estimation method based on structure-texture aware retinex model and its application on face anti-spoofing

Video object detection using fusion of SSD and spatiotemporal features

Spatiotemporal attention network for microexpression recognition

Dynamic update correlation filter tracking based on appearance representation analysis

Contour Extraction Method of Active Image Using Snakes Model and Optical Flow