融合姿态引导和多尺度特征的遮挡行人重识别
Pose guidance and multi-scale feature fusion for occluded person re-identification
- 2024年29卷第8期 页码:2364-2376
纸质出版日期: 2024-08-16
DOI: 10.11834/jig.230523
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-08-16 ,
移动端阅览
张红颖, 刘腾飞, 罗谦, 张涛. 2024. 融合姿态引导和多尺度特征的遮挡行人重识别. 中国图象图形学报, 29(08):2364-2376
Zhang Hongying, Liu Tengfei, Luo Qian, Zhang Tao. 2024. Pose guidance and multi-scale feature fusion for occluded person re-identification. Journal of Image and Graphics, 29(08):2364-2376
目的
2
在行人重识别任务中,行人外观特征会因为遮挡发生变化,从而降低行人特征的辨别性,仅基于可视部分的传统方法仍会识别错误。针对此问题,提出了一种融合姿态引导和多尺度特征的遮挡行人重识别方法。
方法
2
首先,构建了一种特征修复模块,根据遮挡部位邻近信息恢复特征空间中被遮挡区域的语义信息,实现缺失部位特征的修补。然后,为了从修复的图像中提取有效的姿态信息,设计了一种姿态引导模块,通过姿态估计引导特征提取,实现更加精准的行人匹配。最后,搭建了特征增强模块,并融合显著性区域检测方法增强有效的身体部位特征,同时消除背景信息造成的干扰。
结果
2
在3个公开的数据集上进行了对比实验和消融实验,在Market1501、DukeMTMC-reID(Duke multi-tracking multi-camera re-identification)和Occluded-DukeMTMC(occluded Duke multi-tracking multi-camera re-identification)数据集上的平均精度均值(mean average precision, mAP)和首次命中率(rank-1 accuracy, Rank-1)分别为88.8%和95.5%、79.2%和89.3%、51.7%和60.3%。对比实验结果表明提出的融合算法提高了行人匹配的准确率,具有较好的竞争优势。
结论
2
本文所提的姿态引导和多尺度融合方法,修复了因遮挡而缺失的部位特征,结合姿态信息融合了不同粒度的图像特征,提高了模型的识别准确率,能有效缓解遮挡导致的误识别现象,验证了方法的有效性。
Objective
2
Person re-identification (ReID) is an important task in computer vision, and it aims to accurately identify and associate the same person between multiple visual surveillance cameras by extracting and matching features of pedestrian under different scenarios. Occluded person ReID is a challenging and specialized task in the existing person ReID problems. In real-world settings, occlusion is a common issue, and it impacts the practical application of person ReID technique to a certain extent. Recently, occluded person ReID has gradually attracted the attention of many researchers, and several methods have been proposed to address the issue of occlusion, which achieve impressive results. Currently, these methods primarily focus on the visible regions in images. Concretely, it first locates the visible regions in the image and then specially designs a model to extract discerning feature information from these regions, which achieves accurate person matching. These methods typically remove features coming from the occluded areas and then exploit discriminative features from the non-occluded regions for matching. Although these methods achieve impressive results, the influence of occluded regions and background interference in images are ignored, which results in the aforementioned solutions failing to effectively address the misclassification issue resulting from similar appearances in non-occluded regions. Consequently, merely relying on visible regions for subsequent recognition task leads to a sharp performance drop of the model, and the interference coming from image backgrounds also affects the further improvement in recognition accuracy. Some methods have been proposed to recover the occluded regions in images for overcoming the abovementioned issues. Specifically, these methods restore the occluded parts by utilizing the unobstructed image information at the image level. However, the restoration approaches may cause image distortion and introduce an excessive number of parameters.
Method
2
We propose a person ReID method based on pose guidance and multi-scale feature fusion to alleviate the aforementioned issues. This method can enhance the feature representation capability of the model and obtain more discriminative features. First, a feature restoration module is constructed to restore the occluded image features at the feature level while effectively reducing the parameters of the model. The module uses spatial contextual information from the non-occluded regions to predict the features of adjacent occluded regions, which restores the semantic information of the occluded regions in the feature space. The feature restoration module mainly consists of two subparts: the adaptive region division unit and the feature restoration one. The adaptive region division unit divides the image into six regions adaptively according to the predicted localization points to facilitate the clustering of similar feature information in different regions. The adaptive division in the module could effectively alleviate the misalignment caused by fixed division methods, and it could achieve more accurate position alignment. The feature restoration unit comprises of an encoder and a decoder. The encoder encodes the feature information coming from the divided regions of the image with similar appearances or close positions into a cluster. Meanwhile, the decoder assigns the cluster information to the occluded body parts in the image, which completes the feature restoration of missing body parts. Second, a pose estimation network is employed to extract pedestrian pose information. The pose estimation network is responsible for guiding the generation of keypoint heatmaps for the restored complete image features. Then, it implements the prediction of body keypoints with the heatmaps to obtain pose information. The pretrained pose estimation guidance model performs fusion learning on the global non-occluded regions and the restored regions to obtain more distinctive pedestrian feature information for more accurate pedestrian matching. Finally, a feature enhancement module is proposed to extract salient features from the image for eliminating the interference coming from background information while enhancing the learning capability for effective information. This module not only makes the network pay close attention to the valid semantic information in the feature maps but also reduces the interference coming from background noises, which could effectively alleviate the failure of feature learning caused by occlusion.
Result
2
We conducted several comparative experiments and ablation experiments on three publicly available datasets to validate the effectiveness of our method. We employed mean average precision (mAP) and Rank-1 accuracy as our evaluation metrics. Experiment results demonstrate that our method achieves mAP and Rank-1 of 88.8% and 95.5% on the Market1501 dataset, respectively. The mAP and Rank-1 are 79.2% and 89.3%, respectively, on the Duke multi-tracking multi-camera ReID (DukeMTMC-reID) dataset. On the occluded Duke multi-tracking multi-camera re-recognition (Occluded-DukeMTMC) dataset, the mAP and Rank-1 can reach 51.7% and 60.3%, respectively. Moreover, our method outperforms the PGMA-Net by 0.4% in mAP on the Market1501 dataset, by 0.8% in mAP and 0.7% in Rank-1 on the DukeMTMC-reID dataset, and by 1.2% in mAP on the Occluded-DukeMTMC dataset. At the same time, the ablation experiments confirm the effectiveness of the three proposed modules.
Conclusion
2
Our proposed method, pose-guided and multi-scale feature fusion (PGMF), could effectively recover the features of missing body parts, alleviate the issue of background interference, and achieve accurate pedestrian matching. Therefore, the proposed model effectively alleviates the misidentification caused by occlusion, improves the accuracy of person ReID, and exhibits robustness.
行人重识别(ReID)遮挡姿态引导特征融合特征修补
person re-identification (ReID)occlusionpose guidancefeature fusionfeature restoration
Chen Y, Yang Y Z, Liu W F, Huang Y W and Li J M. 2022. Pose-guided counterfactual inference for occluded person re-identification. Image and Vision Computing, 128: #104587 [DOI: 10.1016/j.imavis.2022.104587http://dx.doi.org/10.1016/j.imavis.2022.104587]
Gao S, Wang J Y, Lu H C and Liu Z M. 2020. Pose-guided visible part matching for occluded person ReID//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11741-11749 [DOI: 10.1109/CVPR42600.2020.01176http://dx.doi.org/10.1109/CVPR42600.2020.01176]
Hou R B, Ma B P, Chang H, Gu X Q, Shan S G and Chen X L. 2019. VRSTC: occlusion-free video person re-identification//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7176-7185 [DOI: 10.1109/CVPR.2019.00735http://dx.doi.org/10.1109/CVPR.2019.00735]
Hou R B, Ma B P, Chang H, Gu X Q, Shan S G and Chen X L. 2022. Feature completion for occluded person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9): 4894-4912 [DOI: 10.1109/TPAMI.2021.3079910http://dx.doi.org/10.1109/TPAMI.2021.3079910]
Kim M, Cho M, Lee H, Cho S and Lee S. 2022. Occluded person re-identification via relational adaptive feature correction learning//Proceedings of 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Singapore, Singapore: IEEE: 2719-2723 [DOI: 10.1109/ICASSP43922.2022.9746734http://dx.doi.org/10.1109/ICASSP43922.2022.9746734]
Kiran M, Praveen R G, Nguyen-Meidine L T, Belharbi S, Blais-Morin L A and Granger E. 2023. Holistic guidance for occluded person re-identification [EB/OL]. [2023-07-20]. https://arxiv.org/pdf/2104.06524.pdfhttps://arxiv.org/pdf/2104.06524.pdf
Li Q, Hu W Y, Li J Y, Liu Y and Li M X. 2022. A survey of person re-identification based on deep learning. Chinese Journal of Engineering, 44(5): 920-932
李擎, 胡伟阳, 李江昀, 刘艳, 李梦璇. 2022. 基于深度学习的行人重识别方法综述. 工程科学学报, 44(5): 920-932 [DOI: 10.13374/j.issn2095-9389.2020.12.22.004http://dx.doi.org/10.13374/j.issn2095-9389.2020.12.22.004]
Li Y, Jiang X Y and Hwang J N. 2020. Effective person re-identification by self-attention model guided feature learning. Knowledge-Based Systems, 187: #104832 [DOI: 10.1016/j.knosys.2019.07.003http://dx.doi.org/10.1016/j.knosys.2019.07.003]
Liang X D, Gong K, Shen X H and Lin L. 2019. Look into person: joint body parsing and pose estimation network and a new benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(4): 871-885 [DOI: 10.1109/TPAMI.2018.2820063http://dx.doi.org/10.1109/TPAMI.2018.2820063]
Liu Z G, Wang Q, Wang M and Zhao Y J. 2023. Occluded person re-identification with pose estimation correction and feature reconstruction. IEEE Access, 11: 14906-14914 [DOI: 10.1109/ACCESS.2023.3243113http://dx.doi.org/10.1109/ACCESS.2023.3243113]
Miao J X, Wu Y, Liu P, Ding Y H and Yang Y. 2019. Pose-guided feature alignment for occluded person re-identification//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 542-551 [DOI: 10.1109/ICCV.2019.00063http://dx.doi.org/10.1109/ICCV.2019.00063]
Miao J X, Wu Y and Yang Y. 2022. Identifying visible parts via pose estimation for occluded person re-identification. IEEE Transactions on Neural Networks and Learning Systems, 33(9): 4624-4634 [DOI: 10.1109/TNNLS.2021.3059515http://dx.doi.org/10.1109/TNNLS.2021.3059515]
Peng Y J, Hou S H, Cao C S, Liu X, Huang Y Z and He Z Q. 2022. Deep learning-based occluded person re-identification: a survey [EB/OL]. [2023-07-20]. https://arxiv.org/pdf/2207.14452.pdfhttps://arxiv.org/pdf/2207.14452.pdf
Ristani E, Solera F, Zou R, Cucchiara R and Tomasi C. 2016. Performance measures and a data set for multi-target, multi-camera tracking//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 17-35 [DOI: 10.1007/978-3-319-48881-3_2http://dx.doi.org/10.1007/978-3-319-48881-3_2]
Schroff F, Kalenichenko D and Philbin J. 2015. FaceNet: a unified embedding for face recognition and clustering//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 815-823 [DOI: 10.1109/CVPR.2015.7298682http://dx.doi.org/10.1109/CVPR.2015.7298682]
Sun K, Xiao B, Liu D and Wang J D. 2019. Deep high-resolution representation learning for human pose estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Angeles, USA: IEEE: 5686-5696 [DOI: 10.1109/CVPR.2019.00584http://dx.doi.org/10.1109/CVPR.2019.00584]
Szegedy C, Vanhoucke V, Ioffe S, Shlens J and Wojna Z. 2016. Rethinking the inception architecture for computer vision//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2818-2826 [DOI: 10.1109/CVPR.2016.308http://dx.doi.org/10.1109/CVPR.2016.308]
Tian H C, Liu X P, Yin B C and Li X. 2023. MHSA-Net: multihead self-attention network for occluded person re-identification. IEEE Transactions on Neural Networks and Learning Systems, 34(11): 8210-8224 [DOI: 10.1109/TNNLS.2022.3144163http://dx.doi.org/10.1109/TNNLS.2022.3144163]
Wang G A, Yang S, Liu H Y, Wang Z C, Yang Y, Wang S L, Yu G, Zhou E J and Sun J. 2020. High-order information matters: learning relation and topology for occluded person re-identification//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 6448-6457 [DOI: 10.1109/CVPR42600.2020.00648http://dx.doi.org/10.1109/CVPR42600.2020.00648]
Wang J D, Sun K, Cheng T H, Jiang B R, Deng C R, Zhao Y, Liu D, Mu Y D, Tan M K, Wang X G, Liu W Y and Xiao B. 2021. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10): 3349-3364 [DOI: 10.1109/TPAMI.2020.2983686http://dx.doi.org/10.1109/TPAMI.2020.2983686]
Wang L B, Zhou Y, Sun Y J and Li S. 2022. Occluded person re-identification based on differential attention siamese network. Applied Intelligence, 52(7): 7407-7419 [DOI: 10.1007/s10489-021-02820-6http://dx.doi.org/10.1007/s10489-021-02820-6]
Yang J, Zhang C L, Tang Y P and Li Z X. 2022. PAFM: pose-drive attention fusion mechanism for occluded person re-identification. Neural Computing and Applications, 34(10): 8241-8252 [DOI: 10.1007/s00521-022-06903-4http://dx.doi.org/10.1007/s00521-022-06903-4]
Yang J R, Zhang J W, Yu F F, Jiang X Y, Zhang M D, Sun X, Chen Y C and Zheng W S. 2021. Learning to know where to see: a visibility-aware approach for occluded person re-identification//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 11865-11874 [DOI: 10.1109/ICCV48922.2021.01167http://dx.doi.org/10.1109/ICCV48922.2021.01167]
Zahra A, Perwaiz N, Shahzad M and Fraz M M. 2023. Person re-identification: a retrospective on domain specific open challenges and future trends. Pattern Recognition, 142: #109669 [DOI: 10.1016/j.patcog.2023.109669http://dx.doi.org/10.1016/j.patcog.2023.109669]
Zhang X K, Yan Y, Xue J H, Hua Y and Wang H Z. 2021. Semantic-aware occlusion-robust network for occluded person re-identification. IEEE Transactions on Circuits and Systems for Video Technology, 31(7): 2764-2778 [DOI: 10.1109/TCSVT.2020.3033165http://dx.doi.org/10.1109/TCSVT.2020.3033165]
Zhang Y F, Yang H Y, Zhang Y J, Dou Z P, Liao S C, Zheng W S, Zhang S L, Ye M, Yan Y C, Li J J and Wang S J. 2023. Recent progress in person re-ID. Journal of Image and Graphics, 28(6): 1829-1862
张永飞, 杨航远, 张雨佳, 豆朝鹏, 廖胜才, 郑伟诗, 张史梁, 叶茫, 晏轶超, 李俊杰, 王生进. 2023. 行人再识别技术研究进展. 中国图象图形学报, 28(6): 1829-1862 [DOI: 10.11834/jig.230022http://dx.doi.org/10.11834/jig.230022]
Zheng L, Huang Y J, Lu H C and Yang Y. 2019a. Pose-invariant embedding for deep person re-identification. IEEE Transactions on Image Processing, 28(9): 4500-4509 [DOI: 10.1109/TIP.2019.2910414http://dx.doi.org/10.1109/TIP.2019.2910414]
Zheng L, Shen L Y, Tian L, Wang S J, Wang J D and Tian Q. 2015. Scalable person re-identification: a benchmark//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1116-1124 [DOI: 10.1109/ICCV.2015.133http://dx.doi.org/10.1109/ICCV.2015.133]
Zheng M, Karanam S, Wu Z Y and Radke R J. 2019b. Re-identification with consistent attentive siamese networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5728-5737 [DOI: 10.1109/CVPR.2019.00588http://dx.doi.org/10.1109/CVPR.2019.00588]
相关作者
相关机构