融合姿态引导和多尺度特征的遮挡行人重识别

张红颖; 刘腾飞; 罗谦; 张涛

doi:10.11834/jig.230523

图像分析和识别 | 浏览量 : 0 下载量: 307 CSCD: 0

PDF
导出
分享
收藏
专辑

融合姿态引导和多尺度特征的遮挡行人重识别
Pose guidance and multi-scale feature fusion for occluded person re-identification
2024年29卷第8期页码：2364-2376
收稿日期：2023-08-11，

修回日期：2023-10-24，

纸质出版日期：2024-08-16
DOI： 10.11834/jig.230523
稿件说明：

移动端阅览

张红颖，刘腾飞，罗谦，张涛. 2024. 融合姿态引导和多尺度特征的遮挡行人重识别. 中国图象图形学报， 29(08):2364-2376 DOI： 10.11834/jig.230523.

Zhang Hongying， Liu Tengfei， Luo Qian， Zhang Tao. 2024. Pose guidance and multi-scale feature fusion for occluded person re-identification. Journal of Image and Graphics， 29(08):2364-2376 DOI： 10.11834/jig.230523.

摘要

目的

在行人重识别任务中，行人外观特征会因为遮挡发生变化，从而降低行人特征的辨别性，仅基于可视部分的传统方法仍会识别错误。针对此问题，提出了一种融合姿态引导和多尺度特征的遮挡行人重识别方法。

方法

首先，构建了一种特征修复模块，根据遮挡部位邻近信息恢复特征空间中被遮挡区域的语义信息，实现缺失部位特征的修补。然后，为了从修复的图像中提取有效的姿态信息，设计了一种姿态引导模块，通过姿态估计引导特征提取，实现更加精准的行人匹配。最后，搭建了特征增强模块，并融合显著性区域检测方法增强有效的身体部位特征，同时消除背景信息造成的干扰。

结果

在3个公开的数据集上进行了对比实验和消融实验，在Market1501、DukeMTMC-reID（Duke multi-tracking multi-camera re-identification）和Occluded-DukeMTMC（occluded Duke multi-tracking multi-camera re-identification）数据集上的平均精度均值（mean average precision， mAP）和首次命中率（rank-1 accuracy， Rank-1）分别为88.8%和95.5%、79.2%和89.3%、51.7%和60.3%。对比实验结果表明提出的融合算法提高了行人匹配的准确率，具有较好的竞争优势。

结论

本文所提的姿态引导和多尺度融合方法，修复了因遮挡而缺失的部位特征，结合姿态信息融合了不同粒度的图像特征，提高了模型的识别准确率，能有效缓解遮挡导致的误识别现象，验证了方法的有效性。

Abstract

Objective

Person re-identification （ReID） is an important task in computer vision， and it aims to accurately identify and associate the same person between multiple visual surveillance cameras by extracting and matching features of pedestrian under different scenarios. Occluded person ReID is a challenging and specialized task in the existing person ReID problems. In real-world settings， occlusion is a common issue， and it impacts the practical application of person ReID technique to a certain extent. Recently， occluded person ReID has gradually attracted the attention of many researchers， and several methods have been proposed to address the issue of occlusion， which achieve impressive results. Currently， these methods primarily focus on the visible regions in images. Concretely， it first locates the visible regions in the image and then specially designs a model to extract discerning feature information from these regions， which achieves accurate person matching. These methods typically remove features coming from the occluded areas and then exploit discriminative features from the non-occluded regions for matching. Although these methods achieve impressive results， the influence of occluded regions and background interference in images are ignored， which results in the aforementioned solutions failing to effectively address the misclassification issue resulting from similar appearances in non-occluded regions. Consequently， merely relying on visible regions for subsequent recognition task leads to a sharp performance drop of the model， and the interference coming from image backgrounds also affects the further improvement in recognition accuracy. Some methods have been proposed to recover the occluded regions in images for overcoming the abovementioned issues. Specifically， these methods restore the occluded parts by utilizing the unobstructed image information at the image level. However， the restoration approaches may cause image distortion and introduce an excessive number of parameters.

Method

We propose a person ReID method based on pose guidance and multi-scale feature fusion to alleviate the aforementioned issues. This method can enhance the feature representation capability of the model and obtain more discriminative features. First， a feature restoration module is constructed to restore the occluded image features at the feature level while effectively reducing the parameters of the model. The module uses spatial contextual information from the non-occluded regions to predict the features of adjacent occluded regions， which restores the semantic information of the occluded regions in the feature space. The feature restoration module mainly consists of two subparts： the adaptive region division unit and the feature restoration one. The adaptive region division unit divides the image into six regions adaptively according to the predicted localization points to facilitate the clustering of similar feature information in different regions. The adaptive division in the module could effectively alleviate the misalignment caused by fixed division methods， and it could achieve more accurate position alignment. The feature restoration unit comprises of an encoder and a decoder. The encoder encodes the feature information coming from the divided regions of the image with similar appearances or close positions into a cluster. Meanwhile， the decoder assigns the cluster information to the occluded body parts in the image， which completes the feature restoration of missing body parts. Second， a pose estimation network is employed to extract pedestrian pose information. The pose estimation network is responsible for guiding the generation of keypoint heatmaps for the restored complete image features. Then， it implements the prediction of body keypoints with the heatmaps to obtain pose information. The pretrained pose estimation guidance model performs fusion learning on the global non-occluded regions and the restored regions to obtain more distinctive pedestrian feature information for more accurate pedestrian matching. Finally， a feature enhancement module is proposed to extract salient features from the image for eliminating the interference coming from background information while enhancing the learning capability for effective information. This module not only makes the network pay close attention to the valid semantic information in the feature maps but also reduces the interference coming from background noises， which could effectively alleviate the failure of feature learning caused by occlusion.

Result

We conducted several comparative experiments and ablation experiments on three publicly available datasets to validate the effectiveness of our method. We employed mean average precision （mAP） and Rank-1 accuracy as our evaluation metrics. Experiment results demonstrate that our method achieves mAP and Rank-1 of 88.8% and 95.5% on the Market1501 dataset， respectively. The mAP and Rank-1 are 79.2% and 89.3%， respectively， on the Duke multi-tracking multi-camera ReID （DukeMTMC-reID） dataset. On the occluded Duke multi-tracking multi-camera re-recognition （Occluded-DukeMTMC） dataset， the mAP and Rank-1 can reach 51.7% and 60.3%， respectively. Moreover， our method outperforms the PGMA-Net by 0.4% in mAP on the Market1501 dataset， by 0.8% in mAP and 0.7% in Rank-1 on the DukeMTMC-reID dataset， and by 1.2% in mAP on the Occluded-DukeMTMC dataset. At the same time， the ablation experiments confirm the effectiveness of the three proposed modules.

Conclusion

Our proposed method， pose-guided and multi-scale feature fusion （PGMF）， could effectively recover the features of missing body parts， alleviate the issue of background interference， and achieve accurate pedestrian matching. Therefore， the proposed model effectively alleviates the misidentification caused by occlusion， improves the accuracy of person ReID， and exhibits robustness.

关键词

Keywords

references

Chen Y ， Yang Y Z ， Liu W F ， Huang Y W and Li J M . 2022 . Pose-guided counterfactual inference for occluded person re-identification . Image and Vision Computing ， 128 ： # 104587 ［ DOI： 10.1016/j.imavis.2022.104587 http://dx.doi.org/10.1016/j.imavis.2022.104587 ］

Gao S ， Wang J Y ， Lu H C and Liu Z M . 2020 . Pose-guided visible part matching for occluded person ReID // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 11741 - 11749 ［ DOI： 10.1109/CVPR42600.2020.01176 http://dx.doi.org/10.1109/CVPR42600.2020.01176 ］

Hou R B ， Ma B P ， Chang H ， Gu X Q ， Shan S G and Chen X L . 2019 . VRSTC： occlusion-free video person re-identification // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 7176 - 7185 ［ DOI： 10.1109/CVPR.2019.00735 http://dx.doi.org/10.1109/CVPR.2019.00735 ］

Hou R B ， Ma B P ， Chang H ， Gu X Q ， Shan S G and Chen X L . 2022 . Feature completion for occluded person re-identification . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 44 （ 9 ）： 4894 - 4912 ［ DOI： 10.1109/TPAMI.2021.3079910 http://dx.doi.org/10.1109/TPAMI.2021.3079910 ］

Kim M ， Cho M ， Lee H ， Cho S and Lee S . 2022 . Occluded person re-identification via relational adaptive feature correction learning // Proceedings of 2022 IEEE International Conference on Acoustics， Speech and Signal Processing （ICASSP） . Singapore， Singapore ： IEEE： 2719 - 2723 ［ DOI： 10.1109/ICASSP43922.2022.9746734 http://dx.doi.org/10.1109/ICASSP43922.2022.9746734 ］

Kiran M ， Praveen R G ， Nguyen-Meidine L T ， Belharbi S ， Blais-Morin L A and Granger E . 2023 . Holistic guidance for occluded person re-identification ［EB/OL］. ［ 2023-07-20 ］. https://arxiv.org/pdf/2104.06524.pdf https://arxiv.org/pdf/2104.06524.pdf

Li Q ， Hu W Y ， Li J Y ， Liu Y and Li M X . 2022 . A survey of person re-identification based on deep learning . Chinese Journal of Engineering ， 44 （ 5 ）： 920 - 932

李擎，胡伟阳，李江昀，刘艳，李梦璇 . 2022 . 基于深度学习的行人重识别方法综述 . 工程科学学报， 44 （ 5 ）： 920 - 932 ［ DOI： 10.13374/j.issn2095-9389.2020.12.22.004 http://dx.doi.org/10.13374/j.issn2095-9389.2020.12.22.004 ］

Li Y ， Jiang X Y and Hwang J N . 2020 . Effective person re-identification by self-attention model guided feature learning . Knowledge-Based Systems ， 187 ： # 104832 ［ DOI： 10.1016/j.knosys.2019.07.003 http://dx.doi.org/10.1016/j.knosys.2019.07.003 ］

Liang X D ， Gong K ， Shen X H and Lin L . 2019 . Look into person： joint body parsing and pose estimation network and a new benchmark . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 41 （ 4 ）： 871 - 885 ［ DOI： 10.1109/TPAMI.2018.2820063 http://dx.doi.org/10.1109/TPAMI.2018.2820063 ］

Liu Z G ， Wang Q ， Wang M and Zhao Y J . 2023 . Occluded person re-identification with pose estimation correction and feature reconstruction . IEEE Access ， 11 ： 14906 - 14914 ［ DOI： 10.1109/ACCESS.2023.3243113 http://dx.doi.org/10.1109/ACCESS.2023.3243113 ］

Miao J X ， Wu Y ， Liu P ， Ding Y H and Yang Y . 2019 . Pose-guided feature alignment for occluded person re-identification // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 542 - 551 ［ DOI： 10.1109/ICCV.2019.00063 http://dx.doi.org/10.1109/ICCV.2019.00063 ］

Miao J X ， Wu Y and Yang Y . 2022 . Identifying visible parts via pose estimation for occluded person re-identification . IEEE Transactions on Neural Networks and Learning Systems ， 33 （ 9 ）： 4624 - 4634 ［ DOI： 10.1109/TNNLS.2021.3059515 http://dx.doi.org/10.1109/TNNLS.2021.3059515 ］

Peng Y J ， Hou S H ， Cao C S ， Liu X ， Huang Y Z and He Z Q . 2022 . Deep learning-based occluded person re-identification： a survey ［EB/OL］. ［ 2023-07-20 ］. https://arxiv.org/pdf/2207.14452.pdf https://arxiv.org/pdf/2207.14452.pdf

Ristani E ， Solera F ， Zou R ， Cucchiara R and Tomasi C . 2016 . Performance measures and a data set for multi-target， multi-camera tracking // Proceedings of the 14th European Conference on Computer Vision . Amsterdam， the Netherlands ： Springer： 17 - 35 ［ DOI： 10.1007/978-3-319-48881-3_2 http://dx.doi.org/10.1007/978-3-319-48881-3_2 ］

Schroff F ， Kalenichenko D and Philbin J . 2015 . FaceNet： a unified embedding for face recognition and clustering // Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition . Boston， USA ： IEEE： 815 - 823 ［ DOI： 10.1109/CVPR.2015.7298682 http://dx.doi.org/10.1109/CVPR.2015.7298682 ］

Sun K ， Xiao B ， Liu D and Wang J D . 2019 . Deep high-resolution representation learning for human pose estimation // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Los Angeles， USA ： IEEE： 5686 - 5696 ［ DOI： 10.1109/CVPR.2019.00584 http://dx.doi.org/10.1109/CVPR.2019.00584 ］

Szegedy C ， Vanhoucke V ， Ioffe S ， Shlens J and Wojna Z . 2016 . Rethinking the inception architecture for computer vision // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， USA ： IEEE： 2818 - 2826 ［ DOI： 10.1109/CVPR.2016.308 http://dx.doi.org/10.1109/CVPR.2016.308 ］

Tian H C ， Liu X P ， Yin B C and Li X . 2023 . MHSA-Net： multihead self-attention network for occluded person re-identification . IEEE Transactions on Neural Networks and Learning Systems ， 34 （ 11 ）： 8210 - 8224 ［ DOI： 10.1109/TNNLS.2022.3144163 http://dx.doi.org/10.1109/TNNLS.2022.3144163 ］

Wang G A ， Yang S ， Liu H Y ， Wang Z C ， Yang Y ， Wang S L ， Yu G ， Zhou E J and Sun J . 2020 . High-order information matters： learning relation and topology for occluded person re-identification // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 6448 - 6457 ［ DOI： 10.1109/CVPR42600.2020.00648 http://dx.doi.org/10.1109/CVPR42600.2020.00648 ］

Wang J D ， Sun K ， Cheng T H ， Jiang B R ， Deng C R ， Zhao Y ， Liu D ， Mu Y D ， Tan M K ， Wang X G ， Liu W Y and Xiao B . 2021 . Deep high-resolution representation learning for visual recognition . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 43 （ 10 ）： 3349 - 3364 ［ DOI： 10.1109/TPAMI.2020.2983686 http://dx.doi.org/10.1109/TPAMI.2020.2983686 ］

Wang L B ， Zhou Y ， Sun Y J and Li S . 2022 . Occluded person re-identification based on differential attention siamese network . Applied Intelligence ， 52 （ 7 ）： 7407 - 7419 ［ DOI： 10.1007/s10489-021-02820-6 http://dx.doi.org/10.1007/s10489-021-02820-6 ］

Yang J ， Zhang C L ， Tang Y P and Li Z X . 2022 . PAFM： pose-drive attention fusion mechanism for occluded person re-identification . Neural Computing and Applications ， 34 （ 10 ）： 8241 - 8252 ［ DOI： 10.1007/s00521-022-06903-4 http://dx.doi.org/10.1007/s00521-022-06903-4 ］

Yang J R ， Zhang J W ， Yu F F ， Jiang X Y ， Zhang M D ， Sun X ， Chen Y C and Zheng W S . 2021 . Learning to know where to see： a visibility-aware approach for occluded person re-identification // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada ： IEEE： 11865 - 11874 ［ DOI： 10.1109/ICCV48922.2021.01167 http://dx.doi.org/10.1109/ICCV48922.2021.01167 ］

Zahra A ， Perwaiz N ， Shahzad M and Fraz M M . 2023 . Person re-identification： a retrospective on domain specific open challenges and future trends . Pattern Recognition ， 142 ： # 109669 ［ DOI： 10.1016/j.patcog.2023.109669 http://dx.doi.org/10.1016/j.patcog.2023.109669 ］

Zhang X K ， Yan Y ， Xue J H ， Hua Y and Wang H Z . 2021 . Semantic-aware occlusion-robust network for occluded person re-identification . IEEE Transactions on Circuits and Systems for Video Technology ， 31 （ 7 ）： 2764 - 2778 ［ DOI： 10.1109/TCSVT.2020.3033165 http://dx.doi.org/10.1109/TCSVT.2020.3033165 ］

Zhang Y F ， Yang H Y ， Zhang Y J ， Dou Z P ， Liao S C ， Zheng W S ， Zhang S L ， Ye M ， Yan Y C ， Li J J and Wang S J . 2023 . Recent progress in person re-ID . Journal of Image and Graphics ， 28 （ 6 ）： 1829 - 1862

张永飞，杨航远，张雨佳，豆朝鹏，廖胜才，郑伟诗，张史梁，叶茫，晏轶超，李俊杰，王生进 . 2023 . 行人再识别技术研究进展 . 中国图象图形学报， 28 （ 6 ）： 1829 - 1862 ［ DOI： 10.11834/jig.230022 http://dx.doi.org/10.11834/jig.230022 ］

Zheng L ， Huang Y J ， Lu H C and Yang Y . 2019a . Pose-invariant embedding for deep person re-identification . IEEE Transactions on Image Processing ， 28 （ 9 ）： 4500 - 4509 ［ DOI： 10.1109/TIP.2019.2910414 http://dx.doi.org/10.1109/TIP.2019.2910414 ］

Zheng L ， Shen L Y ， Tian L ， Wang S J ， Wang J D and Tian Q . 2015 . Scalable person re-identification： a benchmark // Proceedings of 2015 IEEE International Conference on Computer Vision . Santiago， Chile ： IEEE： 1116 - 1124 ［ DOI： 10.1109/ICCV.2015.133 http://dx.doi.org/10.1109/ICCV.2015.133 ］

Zheng M ， Karanam S ， Wu Z Y and Radke R J . 2019b . Re-identification with consistent attentive siamese networks // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 5728 - 5737 ［ DOI： 10.1109/CVPR.2019.00588 http://dx.doi.org/10.1109/CVPR.2019.00588 ］

文章被引用时，请邮件提醒。

提交

特征扰动池融合机制的多类工业缺陷检测

多尺度大核注意力特征融合网络的图像超分辨率重建

融合通道注意力的跨尺度Transformer图像超分辨率重建

触觉增强的图卷积点云超分网络

多粒度上下文网络的SAR船舶检测