Current Issue Cover


摘 要
目的 针对目前手势识别方法受环境、光线、旋转、缩放、肤色等影响,导致手势识别精度下降的问题,提出一种结合聚合通道特征(ACF)的手势检测和双树复小波变换(DTCWT)的复杂背景下手势识别方法。方法 在手势图像预处理过程中引入聚合通道特征,采用Adaboost分类器和非极大值抑制算法(NMS)进行目标手势的检测;利用DTCWT对目标手势图像进行多尺度多方向分解,对高低频系数的每一块分别提取梯度直方图(HOG)和局部二值模式(LBP)特征;最后融合各个方向上的高低频特征并通过支持向量机进行分类识别。结果 选取多个场景、多个对象、不同角度和距离的图像作为训练集,并标注区分前背景,对11种手势进行识别实验,并与传统的肤色检测和HOG特征手势识别以及类-Hausdorff距离的手势识别算法进行了实验对比。对于任意可承受范围内的光照、距离等情况下,该方法能够更准确实时地实现手势识别,平均精度达到95.1%。 结论 在图像预处理的情况下,聚合通道特征的引入能够准确检测手势,同时基于DTCWT的手势图像频域特征提取和再融合的方法有效地解决了传统普通图像的单特征识别方法在光线和复杂背景下识别精度不高的问题。
Gesture recognition based on aggregate channel feature and dual-tree complex wavelet transform

baowenxia,xiedongwen,zhuming,liangdong(Anhui University)

Objective With the continuous development of today""s society, people""s yearning for a better life and the level of demand for material life are constantly improving. With this improved technological development, people are bringing a more convenient lifestyle. Human-computer interaction plays an increasingly important role in people""s and computer life, and becomes a powerful tool for people to work, live or play. Because traditional human-computer interaction methods such as keyboards, mice, and touch screens can operate accurately, they restrict people""s use and limit people""s imagination. Therefore, it is meaningful research direction to study gesture recognition based on images or video streams. Compared with traditional I/O devices, gestures are more natural and flexible, which makes gesture recognition technology a research hotspot. More methods are used to process input images or video through techniques such as machine learning and image processing to achieve real-time gesture interaction. This method is a research boom in the field of computer vision. By detecting the hand feature information of these objects in the extracted image or video stream, the categories corresponding to the gestures are analyzed, thereby providing corresponding technical support for these fields. In some cases, the human body background in the scene is not single, but complex and diverse, and due to human arbitrariness, the image light, the distance and the angle of the hand introduced into the camera are diverse, and thus the study of gesture recognition in complex environment has become very important. Aiming at the problem that the current gesture recognition method is affected by environment, light, rotation, zoom, skin color, etc., resulting in low accuracy and speed of gesture recognition, a gesture detection and dual-tree complex wavelet transform(DTCWT) combined with aggregate channel feature (ACF) is proposed. A gesture recognition method in complex background with complex frequency domain feature extraction. The aggregation channel feature includes 10 image channels, and the pixel features of each channel are processed, filtered, and fused to obtain an ACF. Method In the process of gesture image preprocessing, a gesture target detection method using multi-channel feature fusion is introduced as the basic processing of gesture recognition. Adaboost classifier and non-maximum suppression algorithm (NMS) are used to detect target gestures. DTCWT processing is performed on the target gesture image intercepted after the target detection, multi-scale multi-directional decomposition is performed to obtain high and low frequency coefficients, and gradient histogram (HOG) and local binary pattern (LBP) features are extracted for each block of high and low frequency coefficients respectively. Finally, the features of high-low frequency fusion are classified and identified by the support vector machine training model. Therefore, the identification problem is divided into two stages. The first stage detects the target area and deletes the background area, which significantly improves the efficiency of gesture recognition and paves the way for accurate classification in the second stage. Result Images of multiple scenes, multiple objects, different angles and distances were selected as the training set, and the front background was distinguished. The 20 kinds of gestures were identified and compared with traditional skin color detection and HOG feature gesture recognition and class-Hausdorff distance. The gesture recognition algorithm was compared experimentally. For the illumination and distance in any acceptable range, the method can realize gesture recognition more accurately and in real time, and the average precision reaches 95.1%. Conclusion This algorithm has three advantages. First, the introduced gesture target detection algorithm enables accurate positioning and interception of the hand region even in the case of skin color interference in a complex background, and normalization to a fixed size can solve the problem caused by the gesture occurrence scaling. Second, DTCWT is used to extract the high and low frequency coefficients of the image in the frequency domain and calculate the features on the high and low frequencies respectively, feature extraction of signals of different components, reducing redundancy and reducing feature dimensions, and improving the efficiency of extracting features, eliminating the influence of light and rotation. Third, DTCWT has the characteristics of translation invariance, direction selectivity, and a small amount of redundancy. It has fast calculation speed and less memory, which can effectively achieve real-time purposes. When the gesture area is accurately detected, the proposed algorithm can achieve satisfactory results. In future research work, we will further improve the accuracy of hand detection and classification recognition. The deep learning neural network is used to identify more data sets and gesture types to solve the small factors that may cause misidentification, and to obtain higher gesture recognition efficiency and make gesture recognition more practical.