Safety helmet wearing detection method of fusing environmental features and improved YOLOv4

Qingqing Ge; Zhijie Zhang; Long Yuan; Xiumei Li; Junmei Sun

doi:10.11834/jig.200606

Image Analysis and Recognition | Views : 0 下载量: 0 CSCD: 8

PDF
Export
Share
Collection
Album

Safety helmet wearing detection method of fusing environmental features and improved YOLOv4
Vol. 26, Issue 12, Pages: 2904-2917(2021)
Published： 16 December 2021 ，

Accepted： 13 January 2021
DOI： 10.11834/jig.200606
稿件说明：

移动端阅览

Qingqing Ge, Zhijie Zhang, Long Yuan, Xiumei Li, Junmei Sun. Safety helmet wearing detection method of fusing environmental features and improved YOLOv4. [J]. Journal of Image and Graphics 26(12):2904-2917(2021)
DOI：

Qingqing Ge, Zhijie Zhang, Long Yuan, Xiumei Li, Junmei Sun. Safety helmet wearing detection method of fusing environmental features and improved YOLOv4. [J]. Journal of Image and Graphics 26(12):2904-2917(2021) DOI： 10.11834/jig.200606.

摘要

目的

在施工现场，安全帽是最为常见和实用的个人防护用具，能够有效防止和减轻意外带来的头部伤害。但在施工现场的安全帽佩戴检测任务中，经常出现难以检测到小目标，或因为复杂多变的环境因素导致检测准确率降低等情况。针对这些问题，提出一种融合环境特征与改进YOLOv4（you only look once version 4）的安全帽佩戴检测方法。

方法

为补充卷积池化等过程中丢失的特征，在保证YOLOv4得到的3种不同大小的输出特征图与原图经过特征提取得到的特征图感受野一致的情况下，将两者相加，融合高低层特征，捕捉更多细节信息；对融合后的特征图采用3×3卷积操作，以减小特征图融合后的混叠效应，保证特征稳定性；为适应施工现场的各种环境，利用多种数据增强方式进行环境模拟，并采用对抗训练方法增强模型的泛化能力和鲁棒性。

结果

提出的改进YOLOv4方法在开源安全帽佩戴检测数据集（safety helmet wearing dataset，SHWD）上进行测试，平均精度均值（mean average precision，mAP）达到91.55%，较当前流行的几种目标检测算法性能有所提升，其中相比于YOLOv4，mAP提高了5.2%。此外，改进YOLOv4方法在融合环境特征进行数据增强后，mAP提高了4.27%，在各种真实环境条件下进行测试时都有较稳定的表现。

结论

提出的融合环境特征与改进YOLOv4的安全帽佩戴检测方法，以改进模型和数据增强的方式提升模型准确率、泛化能力和鲁棒性，为安全帽佩戴检测提供了有效保障。

Abstract

Objective

National production safety data in 2019 has showed that 95% of production safety accidents were caused by unsafe behaviors of workers

including improperly wearing protection supplies. Therefore

safety helmet wearing detection has played a vital role in safety production. An end-to-end detection algorithm with high accuracy and strong generalization ability is of great significance to ensure operators' personal safety and reduce safety accidents. Safety helmet wearing detection has belonged to the category of target detection. Early target detection algorithms have been mostly conducted via manual feature construction. With the development of deep learning

target detection has been divided into "two-stage detection" and "one-stage detection" and these series of detectors greatly improved the detection speed and detection accuracy. However

the current deep learning algorithms have failed to ensure accurate detection of small targets

and are not generally applicable in various scenarios

resulting in poor generalization ability and week anti-interference ability. To solve these problems

a safety helmet wearing detection method that combines environmental characteristics and improved you only look once version 4 (YOLOv4) has been proposed to achieve efficient detection of safety helmets wearing.

Method

Based on YOLOv4

cross stage partial darknet53 (CSPDarknet53) has been as backbone network

path aggregation network (PANET) and spatial pyramid pooling (SPP) as neck. The feature maps have been achieved with three different sizes of YOLOv4. With the input size 608×608 pixels

the resulting resolutions of YOLO head have been 76×76 pixels

38×38 pixels

and 19×19 pixels respectively. Due to the great difference between the high-level and low-level feature map information

the given input original image has extracted feature to achieve the same resolution as the YOLO head. For the input original image

a 3×3 convolution operation has been conducted via a batch normalization (BN) layer for normalization operation and ReLU with unilateral suppression and sparse activation as the activation function. The above process has been iterated until the resolution of the feature map is consistent with the size of the corresponding YOLO head. Then

under the condition that the receptive field is consistent

the output feature maps with three different sizes of YOLOv4 have been added to the feature maps obtained by feature extraction from the original image

thereby fusing high-level features with low-level features to capture more detailed information. After that

3×3 convolution operation has been used for the fused feature maps to reduce the aliasing effect after feature map fusion to get three sizes of output. The feature map obtained by feature extraction of the original image has represented a shallow network with high resolution and more detailed features to predict the location information while YOLO head represents a deep network with low resolution and more semantic features

which helps to decide the category information. The model can achieve higher accuracy in detecting large and small targets via combining the two feature maps. Moreover

data enhancement techniques have been used

such as random cropping

CutMix

simulating environment corrupted with noise and using adversarial samples for adversarial training

to add small disturbances to the training data. The training data has been enhanced to improve the generalization ability of the model and the robustness of the model has been improved.

Result

The improved YOLOv4 has been tested on the open source safety helmet wearing dataset (SHWD). The mean average precision (mAP) has reached 91.55% and the recall reached 98.62%. Compared with the existing CornerNet-Lite

Faster region convolutional neural network (RCNN)

YOLOv3

YOLOv4 and other algorithms

the proposed method can achieve improved performance in mAP and recall. When compared with traditional YOLOv4

this improved YOLOv4 can increase mAP and recall by 5.2% and 5.79% respectively. The data enhancement methods adopted in this paper has improved the mAP of CornerNet-Lite

Faster RCNN

YOLOv3

YOLOv4 and Improved YOLOv4 from 2% to 5%. The improved YOLOv4 has increased mAP by 4.27% from 91.55% to 95.82%. In addition

the proposed method has more stable performance after data enhancement when testing under different environmental conditions. For instance

the detection performance of night images has been highly improved with mAP increasing from 67.73% to 84.10%. The comparison of experimental results via adding adversarial samples in the training set has shown that the recall of the proposed model has increased by 0.29% and the mAP has increased by 0.56%.

Conclusion

The method which fuses environmental features and improved YOLOv4 has been proposed for safety helmet wearing detection. The proposed method has used convolutional neural networks to extract convolutional features

and improve the efficiency of feature extraction and target detection greatly. Moreover

the effective combined information of high and low layers by fusing different feature maps can improve the detection accuracy of small targets. The multiple data enhancement methods have been used to improve the robustness of the model in complex scenarios. The end-to-end training of the detection algorithm has been realized and achieved the accuracy

generalization ability and robustness of the model improvement for the effective detection of safety helmet wearing.

关键词

安全帽佩戴检测特征图融合数据增强对抗样本YOLOv4

Keywords

safety helmet wearing detectionfeature map fusiondata enhancementadversarial examplesYOLOv4

references

Bochkovskiy A, Wang C Y and Liao H Y M. 2020. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2021-09-23].https://arxiv.org/pdf/2004.10934.pdfhttps://arxiv.org/pdf/2004.10934.pdf

Fang M, Sun T T and Shao Z. 2019. Fast helmet-wearing-condition detection based on improved YOLOv2. Optical Precision Engineering, 27(5): 1196-1205

方明, 孙腾腾, 邵桢. 2019. 基于改进YOLOv2的快速安全帽佩戴情况检测. 光学精密工程, 27(5): 1196-1205)[DOI: 10.3788/OPE.20192705.1196]

Felzenszwalb P, McAllester D and Ramanan D. 2008. A discriminatively trained, multiscale, deformable part model//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE: 1-8[DOI:10.1109/CVPR.2008.4587597http://dx.doi.org/10.1109/CVPR.2008.4587597]

Girshick R. 2015. Fast R-CNN//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1440-1448[DOI:10.1109/ICCV.2015.169http://dx.doi.org/10.1109/ICCV.2015.169]

Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 580-587[DOI: 10.1109/CVPR.2014.81DOI:10.1109/CVPR.2014.81]

Goodfellow I J, Shlens J and Szegedy C. 2015. Explaining and harnessing adversarial examples[EB/OL]. [2021-09-23].https://arxiv.org/pdf/1412.6572.pdfhttps://arxiv.org/pdf/1412.6572.pdf

Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift[EB/OL]. [2021-09-23].https://arxiv.org/pdf/1502.03167.pdfhttps://arxiv.org/pdf/1502.03167.pdf

Jiang W T, Zhang C, Zhang S C and Liu W J. 2019. Multiscale feature map fusion algorithm for target detection. Journal of Image and Graphics, 24(11): 1918-1931

姜文涛, 张驰, 张晟翀, 刘万军. 2019. 多尺度特征图融合的目标检测. 中国图象图形学报, 24(11): 1918-1931)[DOI: 10.11834/jig.190021]

Kim S W, Kook H K, Sun J Y, Kang M C and Ko S J. 2018. Parallel feature pyramid network for object detection//Proceedings of European Conference on Computer Vision. Munich, Germany: Springer: 239-256[DOI:10.1007/978-3-030-01228-1_15http://dx.doi.org/10.1007/978-3-030-01228-1_15]

Kumar S, Neware N, Jain A, Swain D and Singh P. 2020. Automatic helmet detection in real-time and surveillance video//Machine Learning and Information Processing. Singapore, Singapore: Springer: 51-60[DOI:10.1007/978-981-15-1884-3_5http://dx.doi.org/10.1007/978-981-15-1884-3_5]

Law H and Deng J. 2018. Cornernet: detecting objects as paired keypoints//Proceedings of European Conference on Computer Vision. Munich, Germany: Springer: 765-781[DOI:10.1007/978-3-030-01264-9_45http://dx.doi.org/10.1007/978-3-030-01264-9_45]

Law H, Teng Y, Russakovsky O and Deng J. 2020. CornerNet-lite: efficient keypoint based object detection[EB/OL]. [2021-09-23].https://arxiv.org/pdf/1904.08900.pdfhttps://arxiv.org/pdf/1904.08900.pdf

Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2999-3007[DOI:10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324]

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot MultiBox detector//Proceedings of European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 21-37[DOI:10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2]

Liu X H and Ye X N. 2014. Skin color detection and Hu moments in helmet recognition research. Journal of East China University of Science and Technology (Natural Science Edition), 40(3): 365-370

刘晓慧, 叶西宁. 2014. 肤色检测和Hu矩在安全帽识别中的应用. 华东理工大学学报(自然科学版), 40(3): 365-370)[DOI: 10.3969/j.issn.1006-3080.2014.03.016]

Nair V and Hinton G E. 2010. Rectified linear units improve restricted boltzmann machines//Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel: Omnipress: 807-814

Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 779-788[DOI:10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91]

Redmon J and Farhadi A. 2017. YOLO9000: better, faster, stronger//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE: 6517-6525[DOI:10.1109/CVPR.2017.690http://dx.doi.org/10.1109/CVPR.2017.690]

Redmon J and Farhadi A. 2018. YOLOv3: an incremental improvement[EB/OL]. [2021-09-23].https://arxiv.org/pdf/1804.02767.pdfhttps://arxiv.org/pdf/1804.02767.pdf

Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149[DOI: 10.1109/TPAMI.2016.2577031]

Shi H, Chen X Q and Yang Y. 2019. Safety helmet wearing detection method of improved YOLOv3. Computer Engineering and Applications, 55(11) 213-220

施辉, 陈先桥, 杨英. 2019. 改进YOLOv3的安全帽佩戴检测方法. 计算机工程与应用, 55(11): 213-220)[DOI: 10.3778/j.issn.1002-8331.1811-0389]

Silva R R V E, Aires K R T and de Melo Souza Veras R. 2014. Helmet detection on motorcyclists using image descriptors and classifiers//Proceedings of the 27th SIBGRAPI Conference on Graphics, Patterns and Images. Rio de Janeiro, Brazil: IEEE: 141-148[DOI:10.1109/SIBGRAPI.2014.28http://dx.doi.org/10.1109/SIBGRAPI.2014.28]

Viola P and Jones M. 2001. Rapid object detection using a boosted cascade of simple features//Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Kauai, USA: IEEE: 511-518[DOI:10.1109/CVPR.2001.990517http://dx.doi.org/10.1109/CVPR.2001.990517]

Viola P and Jones M J. 2004. Robust real-time face detection. International Journal of Computer Vision, 57(2): 137-154[DOI: 10.1023/B:VISI.0000013087.49260.fb]

Xu D Q and Wu Y Q. 2020. Improved YOLO-V3 with DenseNet for multi-scale remote sensing target detection. Sensors, 20(15): #4276[DOI: 10.3390/s20154276]

Yun S, Han D, Chun S, Oh S J, Yoo Y and Choe J. 2019. CutMix: regularization strategy to train strong classifiers with localizable features//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 6022-6031[DOI:10.1109/ICCV.2019.00612http://dx.doi.org/10.1109/ICCV.2019.00612]

Zeng J X, Fang Q, Fu X and Leng L. 2019. Multi-scale pedestrian detection algorithm with multi-layer features. Journal of Image and Graphics, 24(10): 1683-1691

曾接贤, 方琦, 符祥, 冷璐. 2019. 融合多层特征的多尺度行人检测. 中国图象图形学报, 24(10): 1683-1691)[DOI: 10.11834/jig.190009]

Zhang S F, Wen L Y, Bian X,Lei Z and Li S Z. 2018. Single-shot refinement neural network for object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4203-4212[DOI:10.1109/CVPR.2018.00442http://dx.doi.org/10.1109/CVPR.2018.00442]

Alert me when the article has been cited

提交

A review of adversarial examples for optical character recognition

Survey of small object detection

Review of adversarial examples for object detection

Terahertz image detection combining asymmetric feature attention and feature fusion

A perturbation constraint related weak perceptual adversarial example generation method