ReID2.0： from person ReID to portrait interpretation

Wang Shengjin; Dou Zhaopeng; Fan Yixuan; Li Yali

doi:10.11834/jig.220700

Frontier | Views : 0 下载量: 0 CSCD: 0

PDF
Export
Share
Collection
Album

ReID2.0： from person ReID to portrait interpretation
Vol. 28, Issue 5, Pages: 1326-1345(2023)
Published： 16 May 2023 ，
DOI： 10.11834/jig.220700
稿件说明：

移动端阅览

王生进，豆朝鹏，樊懿轩，李亚利. 2023. ReID2.0：从行人再识别走向人像态势计算. 中国图象图形学报， 28(05):1326-1345

Wang Shengjin， Dou Zhaopeng， Fan Yixuan， Li Yali. 2023. ReID2.0： from person ReID to portrait interpretation. Journal of Image and Graphics， 28(05):1326-1345
王生进，豆朝鹏，樊懿轩，李亚利. 2023. ReID2.0：从行人再识别走向人像态势计算. 中国图象图形学报， 28(05):1326-1345 DOI： 10.11834/jig.220700.

Wang Shengjin， Dou Zhaopeng， Fan Yixuan， Li Yali. 2023. ReID2.0： from person ReID to portrait interpretation. Journal of Image and Graphics， 28(05):1326-1345 DOI： 10.11834/jig.220700.

摘要

行人再识别（person re-identification，Person ReID）指利用计算机视觉技术对在一个摄像头的视频图像中出现的某个确定行人在其他时间、不同位置的摄像头中再次出现时能够辨识出来，或在图像或视频库中检索特定行人。行人再识别研究具有强烈的实际需求，在公共安全、新零售以及人机交互领域具有潜在应用，具备显著的机器学习和计算机视觉领域的理论研究价值。行人成像存在复杂的姿态、视角、光照和成像质量等变化，同时也有一定范围的遮挡等难点，因此行人再识别面临着非常大的技术挑战。近年来，学术界和产业界投入了巨大的人力和资源研究该问题，并取得了一定进展，在多个数据集上的平均准确率均值（mean average precision，mAP）有了较大提升，并部分开始实际应用。尽管如此，当前行人再识别研究主要还是侧重于服装表观的特征，缺乏对行人表观显式的多视角观测和描述，这与人类观测的机理不尽相符。本文旨在打破现有行人再识别任务的设定，形成对行人综合性观测描述。为推进行人再识别研究的进展，本文在前期行人再识别研究的基础上提出了人像态势计算的概念（ReID2.0）。人像态势计算以像态、形态、神态和意态这4态对人像的静态属性和似动状态进行多视角观测和描述。构建了一个新的基准数据集Portrait250K，包含250 000幅人像和对应8个子任务的手动标记的8种标签，并提出一个新的评价指标。提出的人像态势计算从多视角表观信息对行人形成综合性的观测描述，为行人再识别2.0以及类人智能体的进一步研究提供了参考。

Abstract

Person re-identification （Person ReID） has been concerned more in computer vision nowadays. It can identify a pedestrian-targeted in the images and recognize its multiple spatio-temporal re-appearance. Person ReID can be used to retrieve pedestrians-specific from image or video databases as well. Person re-identification research has strong practical needs and has potential applications in the fields of public safety， new retailing， and human-computer interaction. Conventional forensic-based human-relevant face recognition can provide one of the most powerful technical means for identity checking. However， it is challenged that imaging-coordinated is restricted by its rigid angle and distance. The semi-coordinated face recognition is evolved in technically. Actually， there are a large number of scenarios-discreted to be dealt with for public surveillance， where the monitored objects do not need to cooperate with the camera to image， and they do not need to be aware that they are being filmed； in some extreme cases， Some suspects may even deliberately cover themselves key biometric features. To provide wide-ranged tracking spatiotemepally， the surveillance of public security is called for person re-identification urgently. It is possible to sort facial elements out from the back and interprete the facial features further in support of pedestrian re-identification technology. The potential of the person re-identification task is that the recognition object is a non-cooperative target. Pedestrian-oriented imaging has challenged for complicated changes in relevant to its posture， viewing angle， illumination， imaging quality， and certain occlusion-ranged. The key challenges are dealt with its learning-related issues of temporal-based image feature expression and spatial-based meta-image data to the distinctive feature. In addition， compared to the face recognition task， data collection and labeling are more challenging in the person re-identification task， and existing datasets gap are called to be bridged and richer intensively in comparison with face recognition datasets. The feature extractor-generated has a severe overfitting phenomenon in common. The heterogeneity of data set-cross model is still a big challenging issue. Interdisplinary research is calling for the breakthrough of person re-identification. Rank-1 and mean average precision （mAP） have been greatly improved on multiple datasets， and some of them have begun to be applied practically. Current person re-identification analysis is mainly focused on the elements of clothing appearance and lacks of explicit multivisual anglesi-view observation and description of pedestrian appearance， which is inconsistent with the mechanism of human observation. The human-relevant ability of comprehensive perception can generate an observation description of the target from the multi-visual surface information. For example， meet a familiar friend on the street： we will quick-responsed for the perception subconsciously even if we cannot see the face clearly. In addition to clothing information， we will perceive more information-contextual as well， including gender， age， body shape， posture， facial expression and mental state. This paper aims to break the existing setting of person re-identification task and form a comprehensive observation description of pedestrians.To facilitate person re-identification research further， we develop a portrait interpretation calculation （ReID2.0） on the basis of prior person re-identification. Its attributes and motion-like status are observed and described on four aspects as mentioned below： 1） appearance， 2） posture， 3） emotion， and 4） intention. Here， appearance information is used to describe the apparent information of the face and biological characteristics； posture information is focused on the description of static and sequential body shape characteristics of the human body； emotion information is oriented to the facial expression of the human face and emotional expression of a pedestrian； intention information is targeted on the behavioral description and intentional predictions of a pedestrian； these four types of information is based on multi-view observation and perception of pedestrians， and a human-centered representation is constructed to a certain extent. Due to the difficulty of labeling， there is still no dataset to be constructed in a description requirements according to the four aspects of behavior awareness.We demonstrate a benchmark dataset of Portrait250K for the portrait interpretation calculation. The Portrait250K is composed of 250 000 portraits of 51 movies and TV series from various countries. For each portrait， there are eight human-annotated labels corresponding to eight subtasks. The distribution of images and labels illustrates ground truth features， such as its a） long-tailed or unbalanced distributions， b） diversified occlusions， c） truncations， d） lighting， e） clothing， f） makeup， and g） changeable background scenarios. To advance Portrait250K-based portrait interpretation calculation further， the metrics are designed for each subtask and an integrated evaluation metric， called portrait interpretation quality （PIQ），is developed systematically， which can balance the weights for each subtask. Furthermore， we design a paradigm of multi-task learning-based baseline method. Multi-task representation learning is concerned about and a spatial scheme is demonstrated， named feature space separation. A simple learning loss is proposed as well.The proposed portrait interpretation calculation forms a comprehensive observational description of pedestrians， which provides a reference for further research on person re-identification and human-like agents.

关键词

行人再识别人像态势计算ReID2.0表征学习计算机视觉

Keywords

person re-identification（Person ReID）portrait interpretationReID2.0representation learningcomputer vision

references

Ahmed E， Jones M and Marks T K. 2015. An improved deep learning architecture for person re-identification//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston， USA： IEEE： 3908-3916 ［DOI： 10.1109/CVPR.2015.7299016http://dx.doi.org/10.1109/CVPR.2015.7299016］

Bourdev L， Maji S and Malik J. 2011. Describing people： a poselet-based approach to attribute classification//Proceedings of 2011 International Conference on Computer Vision. Barcelona， Spain： IEEE： 1543-1550 ［DOI： 10.1109/ICCV.2011.6126413http://dx.doi.org/10.1109/ICCV.2011.6126413］

Cheng D， Gong Y H， Zhou S P， Wang J J and Zheng N N. 2016. Person re-identification by multi-channel parts-based cnn with improved triplet loss function//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 1335-1344 ［DOI： 10.1109/CVPR.2016.149http://dx.doi.org/10.1109/CVPR.2016.149］

Dou Z P， Wang Z D， Li Y L and Wang S J. 2022. Progressive-granularity retrieval via hierarchical feature alignment for person re-identification//Proceedings of 2022 IEEE International Conference on Acoustics， Speech and Signal Processing. Singapore，Singapore： IEEE： 2714-2718 ［DOI： 10.1109/ICASSP43922.2022.9747234http://dx.doi.org/10.1109/ICASSP43922.2022.9747234］

Gao Y， Ma J Y， Zhao M B， Liu W and Yuille A L. 2019. NDDR-CNN： layerwise feature fusing in multi-task cnns by neural discriminative dimensionality reduction//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 3200-3209 ［DOI： 10.1109/CVPR.2019.00332http://dx.doi.org/10.1109/CVPR.2019.00332］

Geng M Y， Wang Y W， Xiang T and Tian Y H. 2016. Deep transfer learning for person re-identification ［EB/OL］. ［2022-07-08］. https：//arxiv.org/pdf/1611.05244.pdfhttps://arxiv.org/pdf/1611.05244.pdf

Gray D， Brennan S and Tao H. 2007. Evaluating appearance models for recognition， reacquisition， and tracking//Proceedings of the 10th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance. Rio de Janeiro， Brazil： IEEE： 1-7

Hu R H and Singh A. 2021. UniT： multimodal multitask learning with a unified transformer//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 1419-1429 ［DOI： 10.1109/ICCV48922.2021.00147http://dx.doi.org/10.1109/ICCV48922.2021.00147］

Jia J， Chen X T and Huang K Q. 2021. Spatial and semantic consistency regularizations for pedestrian attribute recognition//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 942-951 ［DOI： 10.1109/ICCV48922.2021.00100http://dx.doi.org/10.1109/ICCV48922.2021.00100］

Li W， Zhao R， Xiao T and Wang X G. 2014. DeepReID： deep filter pairing neural network for person re-identification//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， USA： IEEE： 152-159 ［DOI： 10.1109/CVPR.2014.27http://dx.doi.org/10.1109/CVPR.2014.27］

Ristani E， Solera F， Zou R， Cucchiara R and Tomasi C. 2016. Performance measures and a data set for multi-target， multi-camera tracking//Proceedings of 2014 European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 17-35 ［DOI： 10.1007/978-3-319-48881-3_2http://dx.doi.org/10.1007/978-3-319-48881-3_2］

Shen Q， Tian C， Wang J B， Jiao S S and Du L. 2020. Multi-resolution feature attention fusion method for person re-identification. Journal of Image and Graphics， 25（5）： 946-955

沈庆，田畅，王家宝，焦珊珊，杜麟. 2020. 多分辨率特征注意力融合行人再识别. 中国图象图形学报， 25（5）： 946-955 ［DOI： 10.11834/jig.190237http://dx.doi.org/10.11834/jig.190237］

Shi H L， Yang Y， Zhu X Y， Liao S C， Lei Z， Zheng W S and Li S Z. 2016. Embedding deep metric for person re-identification： a study against large variations//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 732-748 ［DOI： 10.1007/978-3-319-46448-0_44http://dx.doi.org/10.1007/978-3-319-46448-0_44］

Shi W D， Zhang Y Z， Liu S W， Zhu S D and Pu J N. 2020. Person re-identification based on deformation and occlusion mechanisms. Journal of Image and Graphics， 25（12）： 2530-2540

史维东，张云洲，刘双伟，朱尚栋，暴吉宁. 2020. 针对形变与遮挡问题的行人再识别. 中国图象图形学报， 25（12）： 2530-2540 ［DOI： 10.11834/jig.200016http://dx.doi.org/10.11834/jig.200016］

Sun Y F， Xu Q， Li Y L， Zhang C， Li Y K， Wang S J and Sun J. 2019. Perceive where to focus： learning visibility-aware part-level features for partial person re-identification//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 393-402 ［DOI： 10.1109/CVPR.2019.00048http://dx.doi.org/10.1109/CVPR.2019.00048］

Sun Y F， Zheng L， Deng W J and Wang S J. 2017. SVDNet for pedestrian retrieval//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 3820-3828 ［DOI： 10.1109/ICCV.2017.410http://dx.doi.org/10.1109/ICCV.2017.410］

Sun Y F， Zheng L， Li Y L， Yang Y， Tian Q and Wang S J. 2021. Learning part-based convolutional features for person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence， 43（3）： 902-917 ［DOI： 10.1109/TPAMI.2019.2938523http://dx.doi.org/10.1109/TPAMI.2019.2938523］

Sun Y F， Zheng L， Yang Y， Tian Q and Wang S J. 2018. Beyond part models： person retrieval with refined part pooling （and a strong convolutional baseline）//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 501-518 ［DOI： 10.1007/978-3-030-01225-0_30http://dx.doi.org/10.1007/978-3-030-01225-0_30］

Ustinova E， Ganin Y and Lempitsky V. 2017. Multi-region bilinear convolutional neural networks for person re-identification//Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance. Lecce， Italy： IEEE： 1-6 ［DOI： 10.1109/AVSS.2017.8078460http://dx.doi.org/10.1109/AVSS.2017.8078460］

Varior R R， Shuai B， Lu J W， Xu D and Wang G. 2016. A siamese long short-term memory architecture for human re-identification//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 135-153 ［DOI： 10.1007/978-3-319-46478-7_9http://dx.doi.org/10.1007/978-3-319-46478-7_9］

Wang J， Yang Y， Mao J H， Huang Z H， Huang C and Xu W. 2016. CNN-RNN： a unified framework for multi-label image classification//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 2285-2294 ［DOI： 10.1109/CVPR.2016.251http://dx.doi.org/10.1109/CVPR.2016.251］

Wang Z D， Zheng L， Liu Y X， Li Y L and Wang S J. 2020. Towards real-time multi-object tracking//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 107-122 ［DOI： 10.1007/978-3-030-58621-8_7http://dx.doi.org/10.1007/978-3-030-58621-8_7］

Wei L H， Zhang S L， Gao W and Tian Q. 2018. Person transfer gan to bridge domain gap for person re-identification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 79-88 ［DOI： 10.1109/CVPR.2018.00016http://dx.doi.org/10.1109/CVPR.2018.00016］

Xiao T， Li H S， Ouyang W L and Wang X G. 2016. Learning deep feature representations with domain guided dropout for person re-identification//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 1249-1258 ［DOI： 10.1109/CVPR.2016.140http://dx.doi.org/10.1109/CVPR.2016.140］

Xu Q， Sun Y F， Li Y L and Wang S J. 2018. Attend and align： improving deep representations with feature alignment layer for person retrieval//Proceedings of the 24th International Conference on Pattern Recognition. Beijing， China： IEEE： 2148-2153 ［DOI： 10.1109/ICPR.2018.8545850http://dx.doi.org/10.1109/ICPR.2018.8545850］

Zheng K C， Lan C L， Zeng W J， Liu J W， Zhang Z Z and Zha Z J. 2021. Pose-guided feature learning with knowledge distillation for occluded person re-identification//Proceedings of the 29th ACM International Conference on Multimedia. Seattle， USA： ACM： 4537-4545 ［DOI： 10.1145/3474085.3475610http://dx.doi.org/10.1145/3474085.3475610］

Zheng L， Huang Y J， Lu H C and Yang Y. 2019a. Pose-invariant embedding for deep person re-identification. IEEE Transactions on Image Processing， 28（9）： 4500-4509 ［DOI： 10.1109/TIP.2019.2910414http://dx.doi.org/10.1109/TIP.2019.2910414］

Zheng L， Shen L Y， Tian L， Wang S J， Wang J D and Tian Q. 2015a. Scalable person re-identification： a benchmark//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago， Chile： IEEE： 1116-1124 ［DOI： 10.1109/ICCV.2015.133http://dx.doi.org/10.1109/ICCV.2015.133］

Zheng W S， Li X， Xiang T， Liao S C， Lai J H and Gong S G. 2015b. Partial person re-identification//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago， Chile： IEEE： 4678-4686 ［DOI： 10.1109/ICCV.2015.531http://dx.doi.org/10.1109/ICCV.2015.531］

Zheng X， Lin L， Ye M， Wang L and He C L. 2020. Improving person re-identification by attention and multi-attributes. Journal of Image and Graphics， 25（5）： 936-945

郑鑫，林兰，叶茂，王丽，贺春林. 2020. 结合注意力机制和多属性分类的行人再识别. 中国图象图形学报， 25（5）： 936-945 ［DOI： 10.11834/jig.190185http://dx.doi.org/10.11834/jig.190185］

Alert me when the article has been cited

提交

Review of cross-view image geolocalization methods

Research progress of three-dimensional gait recognition

Potential and prospects of segment anything model： a survey

Comprehensive survey on 3D visual-language understanding techniques

Deep learning-based real-time semantic segmentation： a survey