俯视深度头肩序列行人再识别

王新年; 刘春华; 齐国清; 张世强

doi:10.11834/jig.190608

图像分析和识别 | 浏览量 : 0 下载量: 0 CSCD: 1

PDF
导出
分享
收藏
专辑

俯视深度头肩序列行人再识别
Person re-identification based on top-view depth head and shoulder sequence
2020年25卷第7期页码：1393-1407
纸质出版日期： 2020-07-16 ，

录用日期： 2020-01-27
DOI： 10.11834/jig.190608
稿件说明：

移动端阅览

王新年, 刘春华, 齐国清, 张世强. 俯视深度头肩序列行人再识别[J]. 中国图象图形学报, 2020,25(7):1393-1407.

Xinnian Wang, Chunhua Liu, Guoqing Qi, Shiqiang Zhang. Person re-identification based on top-view depth head and shoulder sequence[J]. Journal of Image and Graphics, 2020,25(7):1393-1407.
王新年, 刘春华, 齐国清, 张世强. 俯视深度头肩序列行人再识别[J]. 中国图象图形学报, 2020,25(7):1393-1407. DOI： 10.11834/jig.190608.

Xinnian Wang, Chunhua Liu, Guoqing Qi, Shiqiang Zhang. Person re-identification based on top-view depth head and shoulder sequence[J]. Journal of Image and Graphics, 2020,25(7):1393-1407. DOI： 10.11834/jig.190608.

摘要

目的

行人再识别是指在一个或者多个相机拍摄的图像或视频中实现行人匹配的技术，广泛用于图像检索、智能安保等领域。按照相机种类和拍摄视角的不同，行人再识别算法可主要分为基于侧视角彩色相机的行人再识别算法和基于俯视角深度相机的行人再识别算法。在侧视角彩色相机场景中，行人身体的大部分表观信息可见；而在俯视角深度相机场景中，仅行人头部和肩部的结构信息可见。现有的多数算法主要针对侧视角彩色相机场景，只有少数算法可以直接应用于俯视角深度相机场景中，尤其是低分辨率场景，如公交车的车载飞行时间（time of flight，TOF）相机拍摄的视频。因此针对俯视角深度相机场景，本文提出了一种基于俯视深度头肩序列的行人再识别算法，以期提高低分辨率场景下的行人再识别精度。

方法

对俯视深度头肩序列进行头部区域检测和卡尔曼滤波器跟踪，获取行人的头部图像序列，构建头部深度能量图组（head depth energy map group，HeDEMaG），并据此提取深度特征、面积特征、投影特征、傅里叶描述子和方向梯度直方图（histogram of oriented gradient，HOG）特征。计算行人之间头部深度能量图组的各特征之间的相似度，再利用经过模型学习所获得的权重系数对各特征相似度进行加权融合，从而得到相似度总分，将最大相似度对应的行人标签作为识别结果，实现行人再识别。

结果

本文算法在公开的室内单人场景TVPR（top view person re-identification）数据集、自建的室内多人场景TDPI-L（top-view depth based person identification for laboratory scenarios）数据集和公交车实际场景TDPI-B（top-view depth based person identification for bus scenarios）数据集上进行了测试，使用首位匹配率（rank-1）、前5位匹配率（rank-5）、宏F1值（macro-F1）、累计匹配曲线（cumulative match characteristic，CMC）和平均耗时等5个指标来衡量算法性能。其中，rank-1、rank-5和macro-F1分别达到61%、68%和67%以上，相比于典型算法至少提高了11%。

结论

本文构建了表达行人结构与行为特征的头部深度能量图组，实现了适合低分辨率行人的多特征表达；提出了基于权重学习的相似度融合，提高了识别精度，在室内单人、室内多人和公交车实际场景数据集中均取得了较好的效果。

Abstract

Objective

Person reidentification is an important task in video surveillance systems with a goal to establish the correspondence among images or videos of a person taken from different cameras at different times. In accordance with camera types

person re-identification algorithms can be divided into RGB camera-based and depth camera-based ones. RGB camera-based algorithms are generally based on the appearance characteristics of clothes

such as color and texture. Their performances are greatly affected by external conditions

such as illumination variations. On the contrary

depth camera-based algorithms are minimally affected by lighting conditions. Person re-identification algorithms can also be divided into side view-oriented and vertical view-oriented algorithms according to camera-shooting angle. Most body parts can be seen in side-view scenarios

whereas only the plan view of head and shoulders can be seen in vertical-view scenarios. Most existing algorithms are for side-view RGB scenarios

and only a few of them can be directly applied to top-view depth scenarios. For example

they have poor performance in the case of bus-mounted low-resolution depth cameras. Our focus is on person re-identification on depth head and shoulder sequences.

Method

The proposed person re-identification algorithm consists of four modules

namely

head region detection

head depth energy map group (HeDEMaG) construction

HeDEMaG-based multifeature representation and similarity computation

and learning-based score-level fusion and person re-identification. First

the head region detection module is to detect each head region in every frame. The pixel value in a depth image represents the distance between an object and the camera plane. The range that the height of a person distributes is used to roughly segment the candidate head regions. A frame-averaging model is proposed to compute the distance between floor and the camera plane for determining the height of each person with respect to floor. The person's height can be computed by subtracting floor values from the raw frame. The circularity ratio of a head region is used to remove nonhead regions from the candidate regions because the shape of a real head region is similar to a circle. Second

the HeDEMaG construction module is to describe the structural and behavioral characteristics of a walking person's head. Kalman filter and Hungarian matching method are used to track multiple persons' heads in each frame. In the walking process

the head direction may change with time. A principal component analysis(PCA)based method is used to normalize the direction of a person's head regions. Each person's normalized head image sequence is uniformly divided into

groups in time order to capture the structural and behavioral characteristics of a person's head in local and overall time periods. The average map of each group is called the head depth energy map

and the set of the head depth energy maps is named as HeDEMaG. Third

the HeDEMaG-based multifeature representation and similarity computation module is to extract features and compute the similarity between the probe and gallery set. The depth

area

projection maps in two directions

Fourier descriptor

and histogram of oriented gradient(HOG) feature of each head depth energy map in HeDEMaG are proposed to represent a person. The similarity on depth is defined as the ratio of the depth difference to the maximum difference between the probe and gallery set. The similarity on area is defined as the ratio of the area difference to the maximum difference between the probe and gallery set. The similarities on projections

Fourier descriptor

and HOG are computed by their correlation coefficients. Fourth

the learning-based similarity score-level fusion and person re-identification module is to identify persons according to the similarity score that is defined as a weighted version of the above-mentioned five similarity values. The fusing weights are learned from the training set by minimizing the cost function that measures the error rate of recognition. In the experiments

we use the label of the top one image in the ranked list as the predicted label of the probe.

Result

Experiments are conducted on a public top view person re-identification(TVPR) dataset and two self-built datasets to verify the effectiveness of the proposed algorithm. TVPR consists of videos recorded indoors using a vertical RGB-D camera

and only one person's walking behavior is recorded. We establish two datasets

namely

top-view depth based person identification for laboratory scenarios(TDPI-L) and top-view depth based person identification for bus scenarios(TDPI-B)

to verify the performance on multiple persons and real-world scenarios. TDPI-L is composed of videos captured indoors by depth cameras

and more than two persons' walking is recorded in each frame. TDPI-B consists of sequences recorded by bus-mounted low-resolution time of flight(TOF) cameras. Five measures

namely

rank-1

rank-5

macro-F1

cumulative match characteristic(CMC) and average time are used to evaluate the proposed algorithm. The rank-1

rank-5

and macro-F1 of the proposed algorithm are above 61%

68%

and 67%

respectively

which are at least 11% higher than those of the state-of-the-art algorithms. The ablation studies and the effects of tracking algorithms and parameters on the performance are also discussed.

Conclusion

The proposed algorithm is to identify persons in head and shoulder sequences captured by depth cameras from top views. HeDEMaG is proposed to represent the structural and behavioral characteristics of persons. A learning-based fusing weight-computing method is proposed to avoid parameter fine tuning and improve the recognition accuracy. Experimental results show that proposed algorithm outperforms the state-of-the-art algorithms on public available indoor videos and real-world low-resolution bus-mounted videos.

关键词

深度相机俯视深度头肩序列头部深度能量图组相似度权重学习行人再识别

Keywords

depth cameratop view depth head and shoulder sequencehead depth energy map group (HeDEMaG)similarity fusion weights learningperson re-identification

references

Barbosa I B, Cristani M, Del Bue A, Bazzani L and Murino V. 2012. Re-identification with RGB-D sensors//Proceedings of 2012 European Conference on Computer Vision. Florence: Springer: 433-442[DOI: 10.1007/978-3-642-33863-2_43http://dx.doi.org/10.1007/978-3-642-33863-2_43]

Bewley A, Ge Z Y, Ott L, Ramos F and Upcroft B. 2016. Simple online and realtime tracking//Proceedings of 2016 IEEE International Conference on Image Processing. Phoenix: IEEE: 3464-3468[DOI: 10.1109/ICIP.2016.7533003http://dx.doi.org/10.1109/ICIP.2016.7533003]

Han J and Bhanu B. 2006. Individual recognition using gait energy image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(2):316-322[DOI:10.1109/TPAMI.2006.38]

Held D, Thrun S and Savarese S. 2016. Learning to track at 100 FPS with deep regression networks//Proceedings of 2016 European Conference on Computer Vision. Amsterdam: Springer: 749-765[DOI: 10.1007/978-3-319-46448-0_45http://dx.doi.org/10.1007/978-3-319-46448-0_45]

Henriques J F, Caseiro R, Martins P and Batista J. 2015. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3):583-596[DOI:10.1109/TPAMI.2014.2345390]

Hofmann M, Bachmann S and Rigoll G. 2012. 2.5D gait biometrics using the depth gradient histogram energy image//Proceedings of the 5th IEEE International Conference on Biometrics: Theory, Applications and Systems. Arlington: IEEE: 399-403[DOI: 10.1109/BTAS.2012.6374606http://dx.doi.org/10.1109/BTAS.2012.6374606]

Imani Z and Soltanizadeh H. 2019. Local binary pattern, local derivative pattern and skeleton features for RGB-D person re-identification. National Academy Science Letters, 42(3):233-238[DOI:10.1007/s40009-018-0736-9]

Jiang J G, Yang N, Qi M B and Chen C Q. 2019. Person re-identification with region block segmentation and fusion. Journal of Image and Graphics, 24(04):513-522

蒋建国, 杨宁, 齐美彬, 陈翠群.2019.区域块分割与融合的行人再识别.中国图象图形学报, 24(04):513-522[DOI:10. 11834/jig. 180370]

Kim M, Jung J, Kim H and Paik J. 2017. Person Re-identification using color name descriptor-based sparse representation//Proceedings of the 7th IEEE Annual Computing and Communication Workshop and Conference. Las Vegas: IEEE: 1-4[DOI: 10.1109/CCWC.2017.7868394http://dx.doi.org/10.1109/CCWC.2017.7868394]

Liao S C, Hu Y, Zhu X Y and Li S Z. 2015. Person re-identification by local maximal occurrence representation and metric learning//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 2197-2206[DOI: 10.1109/CVPR.2015.7298832http://dx.doi.org/10.1109/CVPR.2015.7298832]

Liao S C, Zhao G Y, Kellokumpu V, Pietikäinen M and Li S Z. 2010. Modeling pixel process with scale invariant local patterns for background subtraction in complex scenes//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco: IEEE: 1301-1306[DOI: 10.1109/CVPR.2010.5539817http://dx.doi.org/10.1109/CVPR.2010.5539817]

Matsukawa T, Okabe T, Suzuki E and Sato Y. 2016. Hierarchical Gaussian descriptor for person re-identification//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 1363-1372[DOI: 10.1109/CVPR.2016.152http://dx.doi.org/10.1109/CVPR.2016.152]

Munaro M, Basso A, Fossati A, Van Gool L and Menegatti E. 2014a. 3D reconstruction of freely moving persons for re-identification with a depth sensor//Proceedings of 2014 IEEE International Conference on Robotics and Automation. Hong Kong, China: IEEE: 4512-4519[DOI: 10.1109/ICRA.2014.6907518http://dx.doi.org/10.1109/ICRA.2014.6907518]

Munaro M, Fossati A, Basso A, Menegatti E and Van Gool L. 2014b. One-shot person re-identification with a consumer depth camera//Gong S G, Cristani M, Yan S C, Loy C C, eds. Person Re-Identification. London: Springer: 161-181[DOI: 10.1007/978-1-4471-6296-4_8http://dx.doi.org/10.1007/978-1-4471-6296-4_8]

Munaro M, Ghidoni S, Dizmen D T and Menegatti E. 2014c. A feature-based approach to people re-identification using skeleton keypoints//Proceedings of 2014 IEEE International Conference on Robotics and Automation. Hong Kong, China: IEEE: 5644-5651[DOI: 10.1109/ICRA.2014.6907689http://dx.doi.org/10.1109/ICRA.2014.6907689]

Nguyen T B, Nguyen H Q, Le T L, Pham T T T and Pham N N. 2019. A quantitative analysis of the effect of human detection and segmentation quality in person re-identification performance//Proceedings of 2019 International Conference on Multimedia Analysis and Pattern Recognition. Ho Chi Minh City: IEEE: 1-6[DOI: 10.1109/MAPR.2019.8743532http://dx.doi.org/10.1109/MAPR.2019.8743532]

Nguyen T B, Tran D L, Le T L, Pham T T T and Doan H G. 2018. An effective implementation of Gaussian of Gaussian descriptor for person re-identification//Proceedings of the 5th NAFOSTED Conference on Information and Computer Science. Ho Chi Minh City: IEEE: 388-393[DOI: 10.1109/NICS.2018.8606858http://dx.doi.org/10.1109/NICS.2018.8606858]

Paolanti M, Romeo L, Liciotti D, Pietrini R, Cenci A, Frontoni E and Zingaretti P. 2018. Person re-identification with RGB-D camera in top-view configuration through multiple nearest neighbor classifiers and neighborhood component features selection. Sensors, 18(10):#3471[DOI:10.3390/s18103471]

Sivapalan S, Chen D, Denman S, Sridharan S and Fookes C. 2011. Gait energy volumes and frontal gait recognition using depth images//Proceedings of 2011 International Joint Conference on Biometrics. Washington: IEEE: 1-6[DOI: 10.1109/IJCB.2011.6117504http://dx.doi.org/10.1109/IJCB.2011.6117504]

Wu A C, Zheng W S and Lai J H. 2017. Robust depth-based person re-identification. IEEE Transactions on Image Processing, 26(6):2588-2603[DOI:10.1109/TIP.2017.2675201]

Xiang Y, Alahi A and Savarese S. 2015. Learning to track: online multi-object tracking by decision making// Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 4705-4713[DOI: 10.1109/ICCV.2015.534http://dx.doi.org/10.1109/ICCV.2015.534]

Zheng S T, Li X Y, Jiang Z Q and Guo X Q. 2017. LOMO3D descriptor for video-based person re-identification//Proceedings of 2017 IEEE Global Conference on Signal and Information Processing. Montreal: IEEE: 672-676[DOI: 10.1109/GlobalSIP.2017.8309044http://dx.doi.org/10.1109/GlobalSIP.2017.8309044]

文章被引用时，请邮件提醒。

提交

结合BiLSTM和注意力机制的视频行人再识别