面向行人再识别的朝向感知特征学习

杨山; 张永飞; 蒲养林; 杨航远

doi:10.11834/jig.240038

图像理解和计算机视觉 | 浏览量 : 0 下载量: 1 CSCD: 0

PDF
导出
分享
收藏
专辑

面向行人再识别的朝向感知特征学习
View-aware feature learning for person re-identification
2025年30卷第1期页码：188-197
纸质出版日期： 2025-01-16 ，
DOI： 10.11834/jig.240038
稿件说明：

移动端阅览

杨山, 张永飞, 蒲养林, 杨航远. 面向行人再识别的朝向感知特征学习[J]. 中国图象图形学报, 2025,30(1):188-197.

YANG SHAN, ZHANG YONGFEI, PU YANGLIN, YANG HANGYUAN. View-aware feature learning for person re-identification. [J]. Journal of image and graphics, 2025, 30(1): 188-197.
杨山, 张永飞, 蒲养林, 杨航远. 面向行人再识别的朝向感知特征学习[J]. 中国图象图形学报, 2025,30(1):188-197. DOI： 10.11834/jig.240038.

YANG SHAN, ZHANG YONGFEI, PU YANGLIN, YANG HANGYUAN. View-aware feature learning for person re-identification. [J]. Journal of image and graphics, 2025, 30(1): 188-197. DOI： 10.11834/jig.240038.

摘要

目的

在行人再识别中，行人朝向变化会导致表观变化，进而导致关联错误。现有方法通过朝向表示学习和基于朝向的损失函数来改善这一问题。然而，大多数朝向表示学习方法主要以嵌入朝向标签为主，并没有显式的向模型传达行人姿态的空间结构，从而减弱了模型对朝向的感知能力。此外，基于朝向的损失函数通常对相同身份的行人进行朝向聚类，忽略了由表观相似且朝向相同的负样本造成的错误关联的问题。

方法

为了应对这些挑战，提出了面向行人再识别的朝向感知特征学习。首先，提出了基于人体姿态的朝向特征学习，它能够显式地捕捉人体姿态的空间结构。其次，提出的朝向自适应的三元组损失主动增大表观相似且相同朝向行人之间的间隔，进而将它们分离。

结果

本文方法在大规模的行人再识别公开数据集MSMT17（multi-scene multi-time person ReID dataset）、Market1501等上进行测试。其中，在MSMT17数据集上，相比于性能第2的UniHCP（unified model for human-centric perceptions）模型，Rank1和mAP值分别提高了1.7%和1.3%；同时，在MSMT17数据集上的消融实验结果证明本文提出的算法有效改善了行人再识别的关联效果。

结论

本文方法能够有效处理上述挑战导致的行人再识别系统中关联效果变差的问题。

Abstract

Objective

In the contemporary digital and internet-driven environment， person re-identification （ReID） technology has become an integral component of domains such as intelligent surveillance， security， and new retail. However， in real-world scenarios， the same person may exhibit significant appearance differences due to changes in view， leading to degraded association performance. Existing methods typically enhance the model’s representation ability and association capacity by first-view representation learning and designing view-based loss functions to make the model perceive view information. While these methods have achieved outstanding results， significant challenges remain， which will be elaborated upon in the following sections. The first challenge is how person representational capability can be retained in models with implicit view feature learning. In terms of view feature representation， existing methods based on the transformer architecture convert view labels into feature vectors through the view embedding layer. These methods hinder the model from perceiving complex posture information from simple labels. Consequently， these methods implicitly learn the view features； that is， they do not explicitly convey to the model the spatial structure of person posture， such as the position of keypoints and their topological relationships. This situation could result in the model not precisely perceiving person postures and views， thereby diminishing the model’s representational capability for persons. To address this issue， our method embeds keypoint coordinates and models the topological structure between keypoints. When this structured information is provided to the model， it can more intuitively understand person postures， allowing for explicit learning of person posture. The second challenge is how persons with similar appearances and the same view can be separated during indiscriminate pushing of anchor from hard negatives. With regard to the design of the view-based loss function， many existing methods generally do not differentiate specific views， learning generic view features， which might strip the model of essential person view information. Alternatively， some approaches leverage triplet loss to reduce feature map distances for persons with the same views while increasing the distances between clusters of the same identity with opposing views and bringing clusters of adjacent views closer together. However， on the basis of our analysis of error cases in real scenarios， persons with similar appearances and the same views often rank higher in retrieval results， leading to degraded performance of the ReID system. Moreover， while the aforementioned methods set a uniform margin to push anchors from hard negative examples， persons with similar appearances and the same views might still not be distinctly separated. To address this issue， we introduce a large margin for different identities with similar appearances and same views to push them apart. We then introduce view-aware feature learning （VAFL） for person ReID to address the outlined challenges.

Method

First， we propose view feature learning based on person posture （Pos2View）. Specifically， the view of a person is inherently determined by the spatial arrangement of various body parts， which provides key insights into their view. Consequently， we integrate the person’s posture information into the feature map， enhancing the model’s ability to discern the person’s view. Second， we propose triplet loss with adaptive view （AdaView）， which assigns adaptive margins between examples on the basis of their views， thereby optimizing the triplet loss for person view awareness. The original triplet loss updates the model by pulling the anchor and the hard positive example closer and pushing the hard negative example away from the anchor. However， our proposed AdaView emphasizes distancing persons with the same view and similar appearances far apart in the feature space. Specifically， these similar-appearance persons are the hard negative examples in the mini-batch， which have the closest Euclidean distance. With the high visual similarity among images of the same person with same views， we aim to pull them closer in the feature space， forming sub-clusters of images with the same view. This action is reflected in the minimal margin. To make the model sensitive to changes in person appearance due to view shifts， for images of the same person with different views， we push apart their corresponding sub-clusters in the feature space. This pushing is signified by a slightly larger margin. We deliberately increase the distance between images in the feature space that have similar appearances but belong to different identities with the same view. This operation is reflected by a larger margin. Collectively， the above steps define the AdaView.

Result

In our comprehensive analysis， we assessed the performance of our proposed method against a variety of established techniques in the field of person ReID. Our evaluation encompassed multiple public datasets， including Market1501 （Market）， DukeMTMC-ReID， MSMT17， and CUHK. To gauge the effectiveness of our approach， we utilized two primary metrics： Rank-1 （R1）， which measures the accuracy of the first result in retrieval， and the mean average precision （mAP）， assessing overall ranking accuracy. Our method involved leveraging person view annotations from select datasets and implementing a model trained on ResNet to predict views of individuals in the MSMT17 dataset. We employed various data augmentation strategies and adhered to hyperparameter settings in line with TransReID. In direct comparison with state-of-the-art methods， including classic person ReID techniques and recent advancements such as TransReID and UniHCP， our proposed method exhibited superior performance. Specifically， on the MSMT17 dataset， our approach surpassed UniHCP by 1.7% in R1 and 1.3% in mAP. This improvement can be attributed to our unique VAFL technique， which enhances cluster differentiation and retrieval accuracy. Further， we conducted tests in generalized person ReID tasks to validate our model’s adaptability and stability in diverse scenarios. Compared with representative generalization methods， our approach demonstrated a slight edge， mainly due to the VAFL technique’s capacity to refine cluster boundaries and maintain a balance between intraclass compactness and interclass dispersion. Our ablation study revealed that removing the VAFL component from our model significantly reduced its performance， highlighting the component’s critical role in the overall effectiveness of our method. This study confirms the robustness and superiority of our approach in the field of person ReID， paving the way for its practical deployment in real-world applications.

Conclusion

In this paper， we introduce VAFL， which enhances the model’s sensitivity to view， aiding in distinguishing persons with similar appearances but from the same view. Experimental results demonstrate that our approach exhibits outstanding performance across various scenarios， confirming its efficiency and reliability.

关键词

行人再识别行人朝向自适应朝向表观相似朝向感知

Keywords

person re-identificationperson viewadaptive viewsimilar appearancesview perception

references

Cao Z， Hidalgo G， Simon T， Wei S E and Sheikh Y. 2021. OpenPose： realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence， 43（1）： 172-186 ［DOI： 10.1109/TPAMI.2019.2929257http://dx.doi.org/10.1109/TPAMI.2019.2929257］

Chen Z H， Liang Y L and Liu Y F. 2022. Viewpoint contrastive and adversarial learning for unsupervised domain adaptive person re-identification//Proceedings of the 7th International Conference on Signal and Image Processing （ICSIP）. Suzhou， China： IEEE： 244-252 ［DOI： 10.1109/ICSIP55141.2022.9885803http://dx.doi.org/10.1109/ICSIP55141.2022.9885803］

Ci Y Z， Wang Y Z， Chen M L， Tang S X， Bai L， Zhu F， Zhao R， Yu F W， Qi D L and Ouyang W L. 2023. UniHCP： a unified model for human-centric perceptions//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Vancouver， Canada： IEEE： 17840-17852 ［DOI： 10.1109/CVPR52729.2023.01711http://dx.doi.org/10.1109/CVPR52729.2023.01711］

Dosovitskiy A， Beyer L， Kolesnikov A， Weissenborn D， Zhai X H， Unterthiner T， Dehghani M， Minderer M， Heigold G， Gelly S， Uszkoreit J and Houlsby N. 2021. An image is worth16 × 16 words： Transformers for image recognition at scale ［EB/OL］. ［2024-01-02］. https://arxiv.org/pdf/2010.11929.pdfhttps://arxiv.org/pdf/2010.11929.pdf

Feng Z X， Lai J H and Xie X H. 2018. Learning view-specific deep networks for person re-identification. IEEE Transactions on Image Processing， 27（7）： 3472-3483 ［DOI： 10.1109/TIP.2018.2818438http://dx.doi.org/10.1109/TIP.2018.2818438］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

He S T， Luo H， Wang P C， Wang F， Li H and Jiang W. 2021. TransReID： transformer-based object re-identification//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision （ICCV）. Montreal， Canada： IEEE： 14993-15002 ［DOI： 10.1109/ICCV48922.2021.01474http://dx.doi.org/10.1109/ICCV48922.2021.01474］

Jin X， Lan C L， Zeng W J， Wei G Q and Chen Z B. 2020. Semantics-aligned representation learning for person re-identification ［EB/OL］. ［2024-01-02］. https://arxiv.org/pdf/1905.13143.pdfhttps://arxiv.org/pdf/1905.13143.pdf

Kipf T N and Welling M. 2017. Semi-supervised classification with graph convolutional networks ［EB/OL］. ［2024-01-02］. https://arxiv.org/pdf/1609.02907.pdfhttps://arxiv.org/pdf/1609.02907.pdf

Li W， Zhao R， Xiao T and Wang X G. 2014. DeepReID： deep filter pairing neural network for person re-identification//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， USA： IEEE： 152-159 ［DOI： 10.1109/CVPR.2014.27http://dx.doi.org/10.1109/CVPR.2014.27］

Liao S C and Shao L. 2021. TransMatcher： deep image matching through transformers for generalizable person re-identification ［EB/OL］. ［2024-01-02］. https://arxiv.org/pdf/2105.14432.pdfhttps://arxiv.org/pdf/2105.14432.pdf

Liao S C and Shao L. 2022. Graph sampling based deep metric learning for generalizable person re-identification//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. New Orleans， USA： IEEE： 7349-7358 ［DOI： 10.1109/CVPR52688.2022.00721http://dx.doi.org/10.1109/CVPR52688.2022.00721］

Liu F Y and Zhang L. 2019. View confusion feature learning for person re-identification//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. Seoul， Korea （South）： IEEE： 6638-6647 ［DOI： 10.1109/ICCV.2019.00674http://dx.doi.org/10.1109/ICCV.2019.00674］

Ristani E， Solera F， Zou R， Cucchiara R and Tomasi C. 2016. Performance measures and a data set for multi-target， multi-camera tracking［C］//Proceedings of 2016 European conference on computer vision. Amsterdam， the Netherlands： Springer International Publishing： 17-35 ［DOI： 10.1007/978-3-319-48881-3_2http://dx.doi.org/10.1007/978-3-319-48881-3_2］

Sarfraz M S， Schumann A， Eberle A and Stiefelhagen R. 2018. A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 420-429 ［DOI： 10.1109/CVPR.2018.00051http://dx.doi.org/10.1109/CVPR.2018.00051］

Schroff F， Kalenichenko D and Philbin J. 2015. FaceNet： a unified embedding for face recognition and clustering//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Boston， USA： IEEE： 815-823 ［DOI： 10.1109/CVPR.2015.7298682http://dx.doi.org/10.1109/CVPR.2015.7298682］

Sun Y F， Zheng L， Yang Y， Tian Q and Wang S J. 2018. Beyond part models： Person retrieval with refined part pooling （and a strong convolutional baseline）［EB/OL］. ［2024-01-02］. https://arxiv.org/pdf/1711.09349.pdfhttps://arxiv.org/pdf/1711.09349.pdf

Wang G S， Yuan Y F， Chen X， Li J W and Zhou X. 2018. Learning discriminative features with multiple granularities for person re-identification ［EB/OL］. ［2024-01-02］. https://arxiv.org/pdf/1804.01438.pdfhttps://arxiv.org/pdf/1804.01438.pdf

Wang S J， Dou Z P， Fan Y X and Li Y L. 2023. ReID2.0： from person ReID to portrait interpretation. Journal of Image and Graphics， 28（5）： 1326-1345

王生进，豆朝鹏，樊懿轩，李亚利. 2023. ReID2.0：从行人再识别走向人像态势计算. 中国图象图形学报， 28（5）： 1326-1345 ［DOI： 10.11834/jig.220700http://dx.doi.org/10.11834/jig.220700］

Wei L H， Zhang S L， Gao W and Tian Q. 2018. Person transfer GAN to bridge domain gap for person re-identification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 79-88 ［DOI： 10.1109/CVPR.2018.00016http://dx.doi.org/10.1109/CVPR.2018.00016］

Wu Y H and Sang N. 2023. Consistency constraints and label optimization-relevant domain-unsupervised adaptive pedestrians’ re-identification. Journal of Image and Graphics， 28（5）： 1372-1383

吴禹航，桑农. 2023. 基于一致性约束和标签优化的无监督域适应行人重识别. 中国图象图形学报， 28（5）： 1372-1383 ［DOI： 10.11834/jig.220618http://dx.doi.org/10.11834/jig.220618］

Yang L L， Lan L， Sun D T， Teng X， Ben X Y and Shen X B. 2023. Newly low-resolution pedestrian re-identification-relevant dataset and its benched method. Journal of Image and Graphics， 28（5）： 1346-1359

杨露露，蓝龙，孙冬婷，滕霄，贲晛烨，沈肖波. 2023. 低分辨率行人重识别数据集及其基准方法. 中国图象图形学报， 28（5）： 1346-1359 ［DOI： 10.11834/jig.221082http://dx.doi.org/10.11834/jig.221082］

Yang S， Zhang Y F， Zhao Q H， Pu Y L and Yang H Y. 2023. Prototype-based support example miner and triplet loss for deep metric learning. Electronics， 12（15）： #3315 ［DOI： 10.3390/electronics12153315http://dx.doi.org/10.3390/electronics12153315］

Ye M， Shen J B， Lin G J， Xiang T， Shao L and Hoi S C H. 2022. Deep learning for person re-identification： a survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence， 44（6）： 2872-2893 ［DOI： 10.1109/TPAMI.2021.3054775http://dx.doi.org/10.1109/TPAMI.2021.3054775］

Zhang Q， Lai J H， Xie X H and Chen H X. 2023. A summary on group re-identification. Journal of Image and Graphics， 28（5）： 1225-1241

张权，赖剑煌，谢晓华，陈泓栩. 2023. 小股人群重识别研究进展. 中国图象图形学报， 28（5）： 1225-1241 ［DOI： 10.11834/jig.220697http://dx.doi.org/10.11834/jig.220697］

Zhang Y F， Yang H Y， Zhang Y J， Dou Z P， Liao S C， Zheng W S， Zhang S L， Ye M， Yan Y C， Li J J and Wang S J. 2023. Recent progress in person re-ID. Journal of Image and Graphics， 28（6）： 1829-1862

张永飞，杨航远，张雨佳，豆朝鹏，廖胜才，郑伟诗，张史梁，叶茫，晏轶超，李俊杰，王生进. 2023. 行人再识别技术研究进展. 中国图象图形学报， 28（6）： 1829-1862 ［DOI： 10.11834/jig.230022http://dx.doi.org/10.11834/jig.230022］

Zhao J， Yuan Y S， Zhang P Y and Wang D. 2023. An efficient Transformer-based object-capturing video annotation method. Journal of Image and Graphics， 28（10）： 3176-3190

赵洁，袁永胜，张鹏宇，王栋. 2023. 轻量化Transformer目标跟踪数据标注算法. 中国图象图形学报， 28（10）： 3176-3190 ［DOI： 10.11834/jig.220823http://dx.doi.org/10.11834/jig.220823］

Zheng L， Shen L Y， Tian L， Wang S J， Wang J D and Tian Q. 2015. Scalable person re-identification： a benchmark//Proceedings of 2015 IEEE International Conference on Computer Vision （ICCV）. Santiago， Chile： IEEE： 1116-1124 ［DOI： 10.1109/ICCV.2015.133http://dx.doi.org/10.1109/ICCV.2015.133］

Zhu K， Guo H Y， Liu Z W， Tang M and Wang J Q. 2020. Identity-guided human semantic parsing for person re-identification//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 346-363 ［DOI： 10.1007/978-3-030-58580-8_21http://dx.doi.org/10.1007/978-3-030-58580-8_21］

Zhu Z， Jiang X， Zheng F， Guo X， Huang F， Sun X and Zheng W. 2020. Aware loss with angular regularization for person re-identification［C］//Proceedings of 2020 AAAI conference on artificial intelligence. New York， USA： AAAI： 13114-13121 ［DOI： 10.1609/AAAI.V34I07.7014http://dx.doi.org/10.1609/AAAI.V34I07.7014］

Zhuang Z J， Wei L H， Xie L X， Zhang T Y， Zhang H H， Wu H Z， Ai H Z and Tian Q. 2020. Rethinking the distribution gap of person re-identification with camera-based batch normalization//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 140-157 ［DOI： 10.1007/978-3-030-58610-2_9http://dx.doi.org/10.1007/978-3-030-58610-2_9］

文章被引用时，请邮件提醒。

提交