融合结构与视觉特征的可见光红外行人重识别算法
A Visible-Infrared Person Re-Identification Algorithm Integrating Structural and Visual Features
- 2025年 页码:1-12
收稿日期:2024-10-05,
修回日期:2025-01-04,
录用日期:2025-02-25,
网络出版日期:2025-02-26
DOI: 10.11834/jig.240600
移动端阅览
浏览全部资源
扫码关注微信
收稿日期:2024-10-05,
修回日期:2025-01-04,
录用日期:2025-02-25,
网络出版日期:2025-02-26,
移动端阅览
目的
2
可见光-红外行人重识别(VI-ReID)因可见光与红外图像间的模态差异而面临挑战,现有方法在特征分辨力方面存在不足。本研究旨在设计一种全新算法以获取高分辨力的行人特征,弥补跨模态识别任务中的不足。
方法
2
本研究提出一种融合结构与视觉特征的VI-ReID算法,通过双流分支进行处理。首先,借助姿态估计提取骨骼关键点来生成结构特征图,通过图卷积网络(GCN)学习骨骼的结构化信息,以构建结构特征提取分支;同时,以ResNet50作为视觉提取分支获取图像视觉特征。在此基础上,提出结构-视觉跨模态注意力机制(SVIAM),融合骨骼和视觉特征,得到高分辨力的联合特征表示。此外,为增强骨骼特征的一致性,提出结构内聚损失(SCLoss)函数,持续优化骨骼特征,有效减少模态内差异,保障算法的稳定性与准确性。
结果
2
实验结果表明,所提出算法在SYSU-MM01数据集上表现卓越,相较于基线DEEN,在all search模式下,Rank-1准确率提高4.21%,mAP提高3.52%;在indoor search模式下,Rank-1准确率提高7.39%,mAP提高2.56%。
结论
2
本研究提出融合结构与视觉特征的VI-ReID算法,有效提升跨模态行人重识别的识别精度,并在复杂场景中展现较高的鲁棒性和准确性。
Objective
2
Visible-infrared person re-identification (VI-ReID) has emerged as a challenging task primarily due to the pronounced modal discrepancies between visible and infrared images. In the visible light spectrum, images are replete with vivid colors and intricate textures, yet they are highly susceptible to perturbations caused by varying illumination conditions. For instance, during dawn or dusk, the subdued light can distort the visual appearance of pedestrians, making it arduous to accurately discern their unique features. In contrast, infrared images, which predominantly capture thermal radiation, offer a distinct advantage in low-light or obscured scenarios. However, they lack the detailed visual cues present in their visible counterparts, such as clothing patterns or facial features. These fundamental differences have led to significant difficulties in achieving reliable person re-identification across modalities. Compounding this issue, existing methods have been found wanting in terms of feature discrimination. In numerous real-world datasets and scenarios, they struggle to distinguish between pedestrians with similar postures or occluded body parts, thereby compromising the overall accuracy and reliability of the recognition process.
Method
2
To address these challenges head-on, this research embarked on a journey to design an innovative algorithm capable of extracting high-resolution pedestrian features, with the ultimate aim of bridging the existing gaps in cross-modal recognition tasks.The methodological framework of this study centered around a novel VI-ReID algorithm that incorporated both structural and visual features, operating through a meticulously designed dual-stream branch architecture. The first step in this process involved the extraction of skeletal key points via advanced pose estimation techniques. By leveraging state-of-the-art algorithms, such as OpenPose or, in some cases, custom-developed variants with enhanced capabilities, we were able to precisely localize the key joints of the human body even in the presence of partial occlusions or extreme postural variations. This was achieved through a series of complex computational steps, beginning with the initial detection of body regions, followed by the refinement of joint positions based on anatomical constraints and probabilistic models. The extracted skeletal key points were then used to generate detailed structural feature maps, which served as the foundation for further analysis.Subsequently, a graph convolutional network (GCN) was employed to delve deep into the structured information encapsulated within the skeletal framework. The GCN architecture was meticulously designed, comprising multiple layers, each with a carefully calibrated node connection pattern. The choice of activation functions, was optimized to enhance the propagation of relevant information while suppressing noise. Additionally, to account for the varying importance of different joints in characterizing a person's gait or posture, weighted processing was implemented, ensuring that the most discriminative features were emphasized. This comprehensive approach enabled the construction of a highly effective structural feature extraction branch.Simultaneously, ResNet50, a renowned deep learning model renowned for its prowess in visual feature extraction, was adapted to serve as the visual extraction branch. In this context, a series of fine-tuning procedures were carried out to tailor the model to the unique characteristics of both visible and infrared images. This involved adjusting the pre-trained model's parameters based on the statistical properties of the target datasets, as well as devising innovative strategies for leveraging the outputs from different hierarchical levels of the network. For instance, by selectively combining features from the early and late layers, we were able to capture both low-level details and high-level semantic information. In some cases, attention mechanisms were integrated to further focus on the most salient visual regions, enhancing the overall discriminative power of the visual features.Building upon these two parallel streams, a Structure-Visual Inter-modal Attention Mechanism (SVIAM) was proposed to seamlessly fuse the skeletal and visual features. This mechanism was underpinned by a sophisticated computational process that involved the precise calculation of correlation weights between the two modalities. Through the use of detailed schematic diagrams and mathematical formulas, we were able to illustrate how the attention was distributed, highlighting the most relevant regions and features for recognition. Compared to simplistic concatenation or rudimentary fusion methods, SVIAM demonstrated a remarkable superiority in terms of feature integration, leading to a more cohesive and discriminative joint feature representation.Furthermore, to bolster the consistency of the skeletal features and mitigate intra-modal differences, a Structure Cohesion Loss (SCLoss) function was devised. The mathematical formulation of SCLoss was derived with great care, taking into account the geometric and topological properties of the skeletal data. Each parameter within the function was meticulously calibrated to serve a specific purpose, whether it was to penalize deviations from the expected skeletal structure or to encourage the alignment of related joints. Through extensive experimental validation and theoretical analysis, we demonstrated how SCLoss effectively optimized the skeletal features, thereby enhancing the overall stability and accuracy of the algorithm.
Result
2
The experimental results provided unequivocal evidence of the algorithm's superiority. On the widely recognized SYSU-MM01 dataset, our proposed algorithm outperformed the baseline DEEN by significant margins. In the all search mode, the Rank-1 accuracy rate witnessed a remarkable boost of 4.21%, while the mean Average Precision (mAP) soared by 3.52%. Similarly, in the indoor search mode, the Rank-1 accuracy rate achieved an even more impressive increase of 7.39%, accompanied by a 2.56% elevation in mAP. These results not only validated the effectiveness of our approach in enhancing cross-modal person re-identification accuracy but also showcased its robustness and reliability in complex scenarios.
Conclusion
2
this research introduced a pioneering VI-ReID algorithm that integrated structural and visual features, effectively surmounting the challenges posed by modal differences and substantially elevating the recognition precision in cross-modal person re-identification. The algorithm's performance in complex and dynamic environments further attested to its high level of robustness and accuracy, paving the way for future advancements in this critical area of research.
Bai Zhongyu , Ding Qichuan , Xu Hongli , Wu Chengdong . 2023 . Human Similar Action Recognition Based on Saliency Image Semantic Features . Journal of Image and Graphics , 28 ( 9 ): 2872 - 2886 .
白忠玉 , 丁其川 , 徐红丽 等 . 2023 . 融合显著性图像语义特征的人体相似动作识别 . 中国图象图形学报 ,( 9 ): 2872 - 2886 .[ DOI: 10.11834/jig.220028 http://dx.doi.org/10.11834/jig.220028 ]
Dat N , Hyung H , and Ki K , et al . 2017 . Person Recognition System Based on a Combination of Body Images from Visible Light and Thermal Cameras . Sensors .;[ DOI: 10.3390/s17030605 http://dx.doi.org/10.3390/s17030605 ]
Feng J , Wu A , and Zheng W . 2023 . Shape-Erased Feature Learning for Visible-Infrared Person Re-identification . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) : 22752 - 22761 .[ DOI: 10.48550/arXiv.2304.04205 http://dx.doi.org/10.48550/arXiv.2304.04205 ]
Feng Sainan . 2023 . Research on Human Action Recognition Method Based on Skeletal Key Points . North China University of Technology. (冯赛楠 . 2023 .
基于骨骼关键点的人体动作识别方法研究 . 北方工业大学 .
Gong J , Zhao S , Lam KM , Gao X , and Shen J . 2023 . Spectrum-Irrelevant Fine-Grained Representation for Visible-Infrared Person Re-Identification . Computer Vision and Image Understanding , 232 : 103703 .[ DOI: https://doi.org/10.1016/j.cviu.2023.103703 http://dx.doi.org/https://doi.org/10.1016/j.cviu.2023.103703 ]
Guo J , Ye Y , and Du H , et al . 2024 . A Triple-Path Global–Local Feature Complementary Network for Visible-Infrared Person Re-Identification . Signal ,Image and Video Processing , 18 : 911 - 921 .[ DOI: https://doi.org/10.1007/s11760-023-02789-4 http://dx.doi.org/https://doi.org/10.1007/s11760-023-02789-4 ]
He K , Zhang X , Ren S and Sun J . 2016 . Deep Residual Learning for Image Recognition. IEEE.[DOI 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
He Zhimin , Xu Jiayun . 2023 . Research Progress on Person Re-identification Algorithms Based on Deep Learning . Intelligent Manufacturing , 3 : 80 - 83 .
何智敏 , 许佳云 . 2023 . 基于深度学习的行人重识别算法研究进展 . 智能制造 , 3 : 80 - 83 .
Hu L , Zou X , and Zhang P . 2023 . Learning Progressive Modality-Shared Transformers for Effective Visible-Infrared Person Re-Identification . In Proceedings of the AAAI Conference on Artificial Intelligence , 37 : 1835 - 1843 .[ DOI: https://doi.org/10.1609/aaai.v37i2.25273 http://dx.doi.org/https://doi.org/10.1609/aaai.v37i2.25273 ]
Huang Guanghong , Lin Guangdong , Wu Erjie , Zhao Xudong , Song Liangliang . 2022 . Design of Fixed-Point Algorithm for Softmax Function in Deep Neural Networks . China Integrated Circuit , 31 ( 7 ): 60 - 64 .
黄光红 , 林广栋 , 吴尔杰 , 赵旭东 , 宋亮亮 . 2022 . 深度神经网络Softmax函数定点算法设计 . 中国集成电路 , 31 ( 7 ): 60 - 64 .
Huang Zhihan , Shen Xiaobo . 2024 . Cross-Modality Person Re-identification Based on Attention Fusion and Feature Enhancement . Journal of Nanjing University of Information Science & Technology. (黄驰涵 ,沈肖波 . 2024 .
基于融合注意力和特征增强的跨模态行人重识别 . 南京信息工程大学学报 . [DOI:10.13878/j.cnki.jnuist.20240330001]
Khandelwal A , Chandra MG , and Pramanik S . 2022 . On Classifying Images using Quantum Image Representation. 2022 IEEE/ACM 7th Symposium on Edge Computing (SEC) , Seattle ,WA ,USA.[DOI 10.1109/SEC54971.2022.00067 http://dx.doi.org/10.1109/SEC54971.2022.00067 ]
Kim Y . 2014 . Convolutional Neural Networks for Sentence Classification . In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) .[ DOI: 10.3115/v1/D14-1181 http://dx.doi.org/10.3115/v1/D14-1181 .]
Kipf TN , and Welling M . 2016 . Semi-Supervised Classification with Graph Convolutional Networks . ArXiv .[ DOI: 10.48550/arXiv.1609.02907 http://dx.doi.org/10.48550/arXiv.1609.02907 .]
Li Rui , Jiang Min . 2023 . Pedestrian Re-identification Algorithm Based on Pose Estimation and Feature Similarity . Laser & Optoelectronics Progress. (李枘 , 蒋敏 . 2023 .
基于姿态估计与特征相似度的行人重识别算法 . Laser & Optoelectronics Progress . [DOI:10.3788/lop212869.]
Liang T , Jin Y , Liu W , and Li Y . 2023 . Cross-Modality Transformer With Modality Mining for Visible-Infrared Person Re-Identification . IEEE Transactions on Multimedia , 25 : 8432 - 8444 .[ DOI: 10.1109/TMM.2023.3237155 http://dx.doi.org/10.1109/TMM.2023.3237155 ]
Liang Zhijun , Liu Dong . 2021 . A Human-Object Interaction Detection Module Network Based on Pose Information . Application Research of Computers , 38 ( 8 ): 4 .
梁志军 , 刘栋 . 2021 . 基于姿态信息的人与物体交互检测模块网络 . 计算机应用研究 , 38 ( 8 ): 4 . [ DOI: 10.19734/j.issn.1001-3695.2020.11.0429 http://dx.doi.org/10.19734/j.issn.1001-3695.2020.11.0429 .]
Liu H , Ma S , Xia D , and Li S . 2023 . SFANet: A Spectrum-Aware Feature Augmentation Network for Visible-Infrared Person Re-identification . IEEE Transactions on Neural Networks and Learning Systems , 34 ( 4 ): 1958 - 1971 .[ DOI: 10.1109/TNNLS.2021.3105702 http://dx.doi.org/10.1109/TNNLS.2021.3105702 ]
Liu J , Wang J , Huang N , Zhang Q , and Han J . 2022 . Revisiting Modality-Specific Feature Compensation for Visible-Infrared Person Re-Identification . IEEE Transactions on Circuits and Systems for Video Technology , 32 ( 10 ): 7226 - 7240 .[ DOI: 10.1109/TCSVT.2022.3168999 http://dx.doi.org/10.1109/TCSVT.2022.3168999 ]
Liu Jing . 2021 . Research on Smoking Action Recognition Method Based on Skeletal Information . China University of Mining and Technology. (刘婧 . 2021 .
基于骨骼信息的吸烟动作识别方法研究 . 中国矿业大学 .
Lv Y , Wang G , Zhao W , and Guan Z . 2024 . Edge-Weight-Embedding Graph Convolutional Network for Person Re-identification . IEEE Intelligent Systems , 39 ( 4 ): 74 - 82 .[ DOI: 10.1109/MIS.2024.3385381 http://dx.doi.org/10.1109/MIS.2024.3385381 ]
Miao Y , Huang N , Ma X and Han J . 2023 . On exploring pose estimation as an auxiliary learning task for Visible-Infrared Person Re-identification . Neurocomputing , 556 (Nov. 1 ): 1 . 1 - 1 .10.[ DOI: https://doi.org/10.48550/arXiv.2201.03859 http://dx.doi.org/https://doi.org/10.48550/arXiv.2201.03859 ]
Nayeem TA , Motaharuzzaman SM , Hoque AT , and Rahman MH . 2022 . Computer Vision Based Object Detection and Recognition System for Image Searching . 2022 12th International Conference on Electrical and Computer Engineering (ICECE) ,Dhaka ,Bangladesh .[ DOI: 10.1109/ICECE57408.2022.10089019 http://dx.doi.org/10.1109/ICECE57408.2022.10089019 ]
Qian Y , and Tang S-K . 2024 . Pose Attention-Guided Paired-Images Generation for Visible-Infrared Person Re-Identification . IEEE Signal Processing Letters , 31 : 346 - 350 .[ DOI: 10.1109/LSP.2024.3354190 http://dx.doi.org/10.1109/LSP.2024.3354190 ]
Radenović F , Tolias G , and Chum O . 2019 . Fine-Tuning CNN Image Retrieval with No Human Annotation . IEEE Transactions on Pattern Analysis and Machine Intelligence , 41 ( 7 ): 1655 - 1668 .[ DOI: 10.1109/TPAMI.2018.2858820 http://dx.doi.org/10.1109/TPAMI.2018.2858820 .]
Rao H , Leung C , and Miao C . 2023 . Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-Identification . ArXiv , abs/2307.12917 . [ DOI: 10.48550/arXiv.2307.12917 http://dx.doi.org/10.48550/arXiv.2307.12917 .]
Wu WS , Zheng H-X , Yu H-X , Gong S , and Lai J . 2017 . RGB-Infrared Cross-Modality Person Re-identification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) ,Venice,Italy.[DOI 10.1109/ICCV.2017.575 http://dx.doi.org/10.1109/ICCV.2017.575 ]
Xia D , Liu H , Xu L , Wang L . 2021 . Visible-Infrared Person Re-Identification with Data Augmentation via Cycle-Consistent Adversarial Network . Neurocomputing .[ DOI: 10.1016/j.neucom.2021.02.088 http://dx.doi.org/10.1016/j.neucom.2021.02.088 ]
Yang Hewen , Yang Pingping , Guo Haichen . 2018 . Research on Gesture Recognition Based on Skeletal Information . Computer Applications and Software.(杨和稳,杨萍萍,郭海晨 . 2018 .
基于骨骼信息下的手势识别研究 . 计算机应用与软件 .[ DOI: 10.3969 /j.issn.1000-386x.2018.12.042]
Yang lei . 2023 . Deep Learning in Person Re-identification: A Review . China Water Transport (Second Half) , 23 ( 7 ): 57 - 59 .
杨磊 . 2023 . 基于深度学习的行人重识别综述 . 中国水运(下半月) , 23 ( 7 ): 57 - 59 .[ DOI: 10.3969/j.issn.1006-7973.2023.07.0057 http://dx.doi.org/10.3969/j.issn.1006-7973.2023.07.0057 ]
Yang X , Dong W , Li M , Wei Z , Wang N , and Gao X . 2024 . Cooperative Separation of Modality Shared-Specific Features for Visible-Infrared Person Re-Identification . IEEE Transactions on Multimedia . [ DOI: 10.1109/TMM.2024.3377139 http://dx.doi.org/10.1109/TMM.2024.3377139 .]
Ye M , Shen J , Lin G , Xiang T , and Hoi SCH . 2021 . Deep Learning for Person Re-identification: A Survey and Outlook . IEEE Transactions on Pattern Analysis and Machine Intelligence .[ DOI: 10.1109/TPAMI.2021.3054775 http://dx.doi.org/10.1109/TPAMI.2021.3054775 ]
Yu H , Cheng X , Peng W , Liu W , and Zhao G . 2023 . Modality Unifying Network for Visible-Infrared Person Re-identification . In Proceedings of the IEEE International Conference on Computer Vision (ICCV) : 11185 - 11195 .[ DOI: 10.48550/arXiv.2309.06262 http://dx.doi.org/10.48550/arXiv.2309.06262 ]
Zhang Boxing , Zhang Shouming , Zhong Zhenyu . 2022 . Person Re-identification Based on Multi-granularity Feature Fusion Network . Optoelectronics & Laser. (张勃兴 ,张寿明 ,钟震宇 . 2022 .
基于多粒度特征融合网络的行人重识别 . 光电子·激光 . [DOI:10.16136/j.joel.2022.09.0886]
Zhang H , Cheng S , and Du A . 2024 . Multi-Stage Auxiliary Learning for Visible-Infrared Person Re-identification . IEEE Transactions on Circuits and Systems for Video Technology . [ DOI: 10.1109/TCSVT.2024.3425536 http://dx.doi.org/10.1109/TCSVT.2024.3425536 ]
Zhang Y , Lu Y , Yan Y , Wang H , and Li X . 2024 . Frequency Domain Nuances Mining for Visible-Infrared Person Re-Identification . ArXiv , 2401.02162 .[ DOI: 10.48550/arXiv.2401.02162 http://dx.doi.org/10.48550/arXiv.2401.02162 ]
Zhang Y , Wang H , and Wang H . 2023 . Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) : 2153 - 2162 .[ DOI: 10.48550/arXiv.2303.14481 http://dx.doi.org/10.48550/arXiv.2303.14481 ]
Zhijun HE , Zhao H , Wang J and Feng Wenquan . 2023 . Pose Matters: Pose Guided Graph Attention Network for Person Re-identification . Chinese Journal of Aeronautics , 5 .[ DOI: 10.48550/arXiv.2111.14411 http://dx.doi.org/10.48550/arXiv.2111.14411 ]
Zhu Min , Ming Zhangqiang , Yan Jianrong , Yang Yong , Zhu Jiamin . 2022 . A Review of Person Re-identification Methods Based on Generative Adversarial Networks . Journal of Computer-Aided Design & Computer Graphics , 34 ( 2 ):
朱敏 , 明章强 , 闫建荣 等 . 2022 . 基于生成对抗网络的行人重识别方法研究综述 . 计算机辅助设计与图形学学报 , 34 ( 2 ): 17 .[ DOI: 10.3724/SP.J.1089.2022.18852 http://dx.doi.org/10.3724/SP.J.1089.2022.18852 ]
相关文章
相关作者
相关机构