Safety helmet wearing detection method of fusing environmental features and improved YOLOv4
- Vol. 26, Issue 12, Pages: 2904-2917(2021)
Published: 16 December 2021 ,
Accepted: 13 January 2021
DOI: 10.11834/jig.200606
移动端阅览
浏览全部资源
扫码关注微信
Published: 16 December 2021 ,
Accepted: 13 January 2021
移动端阅览
Qingqing Ge, Zhijie Zhang, Long Yuan, Xiumei Li, Junmei Sun. Safety helmet wearing detection method of fusing environmental features and improved YOLOv4. [J]. Journal of Image and Graphics 26(12):2904-2917(2021)
目的
2
在施工现场,安全帽是最为常见和实用的个人防护用具,能够有效防止和减轻意外带来的头部伤害。但在施工现场的安全帽佩戴检测任务中,经常出现难以检测到小目标,或因为复杂多变的环境因素导致检测准确率降低等情况。针对这些问题,提出一种融合环境特征与改进YOLOv4(you only look once version 4)的安全帽佩戴检测方法。
方法
2
为补充卷积池化等过程中丢失的特征,在保证YOLOv4得到的3种不同大小的输出特征图与原图经过特征提取得到的特征图感受野一致的情况下,将两者相加,融合高低层特征,捕捉更多细节信息;对融合后的特征图采用3×3卷积操作,以减小特征图融合后的混叠效应,保证特征稳定性;为适应施工现场的各种环境,利用多种数据增强方式进行环境模拟,并采用对抗训练方法增强模型的泛化能力和鲁棒性。
结果
2
提出的改进YOLOv4方法在开源安全帽佩戴检测数据集(safety helmet wearing dataset,SHWD)上进行测试,平均精度均值(mean average precision,mAP)达到91.55%,较当前流行的几种目标检测算法性能有所提升,其中相比于YOLOv4,mAP提高了5.2%。此外,改进YOLOv4方法在融合环境特征进行数据增强后,mAP提高了4.27%,在各种真实环境条件下进行测试时都有较稳定的表现。
结论
2
提出的融合环境特征与改进YOLOv4的安全帽佩戴检测方法,以改进模型和数据增强的方式提升模型准确率、泛化能力和鲁棒性,为安全帽佩戴检测提供了有效保障。
Objective
2
National production safety data in 2019 has showed that 95% of production safety accidents were caused by unsafe behaviors of workers
including improperly wearing protection supplies. Therefore
safety helmet wearing detection has played a vital role in safety production. An end-to-end detection algorithm with high accuracy and strong generalization ability is of great significance to ensure operators' personal safety and reduce safety accidents. Safety helmet wearing detection has belonged to the category of target detection. Early target detection algorithms have been mostly conducted via manual feature construction. With the development of deep learning
target detection has been divided into "two-stage detection" and "one-stage detection" and these series of detectors greatly improved the detection speed and detection accuracy. However
the current deep learning algorithms have failed to ensure accurate detection of small targets
and are not generally applicable in various scenarios
resulting in poor generalization ability and week anti-interference ability. To solve these problems
a safety helmet wearing detection method that combines environmental characteristics and improved you only look once version 4 (YOLOv4) has been proposed to achieve efficient detection of safety helmets wearing.
Method
2
Based on YOLOv4
cross stage partial darknet53 (CSPDarknet53) has been as backbone network
path aggregation network (PANET) and spatial pyramid pooling (SPP) as neck. The feature maps have been achieved with three different sizes of YOLOv4. With the input size 608×608 pixels
the resulting resolutions of YOLO head have been 76×76 pixels
38×38 pixels
and 19×19 pixels respectively. Due to the great difference between the high-level and low-level feature map information
the given input original image has extracted feature to achieve the same resolution as the YOLO head. For the input original image
a 3×3 convolution operation has been conducted via a batch normalization (BN) layer for normalization operation and ReLU with unilateral suppression and sparse activation as the activation function. The above process has been iterated until the resolution of the feature map is consistent with the size of the corresponding YOLO head. Then
under the condition that the receptive field is consistent
the output feature maps with three different sizes of YOLOv4 have been added to the feature maps obtained by feature extraction from the original image
thereby fusing high-level features with low-level features to capture more detailed information. After that
3×3 convolution operation has been used for the fused feature maps to reduce the aliasing effect after feature map fusion to get three sizes of output. The feature map obtained by feature extraction of the original image has represented a shallow network with high resolution and more detailed features to predict the location information while YOLO head represents a deep network with low resolution and more semantic features
which helps to decide the category information. The model can achieve higher accuracy in detecting large and small targets via combining the two feature maps. Moreover
data enhancement techniques have been used
such as random cropping
CutMix
simulating environment corrupted with noise and using adversarial samples for adversarial training
to add small disturbances to the training data. The training data has been enhanced to improve the generalization ability of the model and the robustness of the model has been improved.
Result
2
The improved YOLOv4 has been tested on the open source safety helmet wearing dataset (SHWD). The mean average precision (mAP) has reached 91.55% and the recall reached 98.62%. Compared with the existing CornerNet-Lite
Faster region convolutional neural network (RCNN)
YOLOv3
YOLOv4 and other algorithms
the proposed method can achieve improved performance in mAP and recall. When compared with traditional YOLOv4
this improved YOLOv4 can increase mAP and recall by 5.2% and 5.79% respectively. The data enhancement methods adopted in this paper has improved the mAP of CornerNet-Lite
Faster RCNN
YOLOv3
YOLOv4 and Improved YOLOv4 from 2% to 5%. The improved YOLOv4 has increased mAP by 4.27% from 91.55% to 95.82%. In addition
the proposed method has more stable performance after data enhancement when testing under different environmental conditions. For instance
the detection performance of night images has been highly improved with mAP increasing from 67.73% to 84.10%. The comparison of experimental results via adding adversarial samples in the training set has shown that the recall of the proposed model has increased by 0.29% and the mAP has increased by 0.56%.
Conclusion
2
The method which fuses environmental features and improved YOLOv4 has been proposed for safety helmet wearing detection. The proposed method has used convolutional neural networks to extract convolutional features
and improve the efficiency of feature extraction and target detection greatly. Moreover
the effective combined information of high and low layers by fusing different feature maps can improve the detection accuracy of small targets. The multiple data enhancement methods have been used to improve the robustness of the model in complex scenarios. The end-to-end training of the detection algorithm has been realized and achieved the accuracy
generalization ability and robustness of the model improvement for the effective detection of safety helmet wearing.
安全帽佩戴检测特征图融合数据增强对抗样本YOLOv4
safety helmet wearing detectionfeature map fusiondata enhancementadversarial examplesYOLOv4
Bochkovskiy A, Wang C Y and Liao H Y M. 2020. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2021-09-23].https://arxiv.org/pdf/2004.10934.pdfhttps://arxiv.org/pdf/2004.10934.pdf
Fang M, Sun T T and Shao Z. 2019. Fast helmet-wearing-condition detection based on improved YOLOv2. Optical Precision Engineering, 27(5): 1196-1205
方明, 孙腾腾, 邵桢. 2019. 基于改进YOLOv2的快速安全帽佩戴情况检测. 光学精密工程, 27(5): 1196-1205)[DOI: 10.3788/OPE.20192705.1196]
Felzenszwalb P, McAllester D and Ramanan D. 2008. A discriminatively trained, multiscale, deformable part model//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE: 1-8[DOI:10.1109/CVPR.2008.4587597http://dx.doi.org/10.1109/CVPR.2008.4587597]
Girshick R. 2015. Fast R-CNN//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1440-1448[DOI:10.1109/ICCV.2015.169http://dx.doi.org/10.1109/ICCV.2015.169]
Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 580-587[DOI: 10.1109/CVPR.2014.81DOI:10.1109/CVPR.2014.81]
Goodfellow I J, Shlens J and Szegedy C. 2015. Explaining and harnessing adversarial examples[EB/OL]. [2021-09-23].https://arxiv.org/pdf/1412.6572.pdfhttps://arxiv.org/pdf/1412.6572.pdf
Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift[EB/OL]. [2021-09-23].https://arxiv.org/pdf/1502.03167.pdfhttps://arxiv.org/pdf/1502.03167.pdf
Jiang W T, Zhang C, Zhang S C and Liu W J. 2019. Multiscale feature map fusion algorithm for target detection. Journal of Image and Graphics, 24(11): 1918-1931
姜文涛, 张驰, 张晟翀, 刘万军. 2019. 多尺度特征图融合的目标检测. 中国图象图形学报, 24(11): 1918-1931)[DOI: 10.11834/jig.190021]
Kim S W, Kook H K, Sun J Y, Kang M C and Ko S J. 2018. Parallel feature pyramid network for object detection//Proceedings of European Conference on Computer Vision. Munich, Germany: Springer: 239-256[DOI:10.1007/978-3-030-01228-1_15http://dx.doi.org/10.1007/978-3-030-01228-1_15]
Kumar S, Neware N, Jain A, Swain D and Singh P. 2020. Automatic helmet detection in real-time and surveillance video//Machine Learning and Information Processing. Singapore, Singapore: Springer: 51-60[DOI:10.1007/978-981-15-1884-3_5http://dx.doi.org/10.1007/978-981-15-1884-3_5]
Law H and Deng J. 2018. Cornernet: detecting objects as paired keypoints//Proceedings of European Conference on Computer Vision. Munich, Germany: Springer: 765-781[DOI:10.1007/978-3-030-01264-9_45http://dx.doi.org/10.1007/978-3-030-01264-9_45]
Law H, Teng Y, Russakovsky O and Deng J. 2020. CornerNet-lite: efficient keypoint based object detection[EB/OL]. [2021-09-23].https://arxiv.org/pdf/1904.08900.pdfhttps://arxiv.org/pdf/1904.08900.pdf
Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2999-3007[DOI:10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324]
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot MultiBox detector//Proceedings of European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 21-37[DOI:10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2]
Liu X H and Ye X N. 2014. Skin color detection and Hu moments in helmet recognition research. Journal of East China University of Science and Technology (Natural Science Edition), 40(3): 365-370
刘晓慧, 叶西宁. 2014. 肤色检测和Hu矩在安全帽识别中的应用. 华东理工大学学报(自然科学版), 40(3): 365-370)[DOI: 10.3969/j.issn.1006-3080.2014.03.016]
Nair V and Hinton G E. 2010. Rectified linear units improve restricted boltzmann machines//Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel: Omnipress: 807-814
Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 779-788[DOI:10.1109/CVPR.2016.91http://dx.doi.org/10.1109/CVPR.2016.91]
Redmon J and Farhadi A. 2017. YOLO9000: better, faster, stronger//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE: 6517-6525[DOI:10.1109/CVPR.2017.690http://dx.doi.org/10.1109/CVPR.2017.690]
Redmon J and Farhadi A. 2018. YOLOv3: an incremental improvement[EB/OL]. [2021-09-23].https://arxiv.org/pdf/1804.02767.pdfhttps://arxiv.org/pdf/1804.02767.pdf
Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149[DOI: 10.1109/TPAMI.2016.2577031]
Shi H, Chen X Q and Yang Y. 2019. Safety helmet wearing detection method of improved YOLOv3. Computer Engineering and Applications, 55(11) 213-220
施辉, 陈先桥, 杨英. 2019. 改进YOLOv3的安全帽佩戴检测方法. 计算机工程与应用, 55(11): 213-220)[DOI: 10.3778/j.issn.1002-8331.1811-0389]
Silva R R V E, Aires K R T and de Melo Souza Veras R. 2014. Helmet detection on motorcyclists using image descriptors and classifiers//Proceedings of the 27th SIBGRAPI Conference on Graphics, Patterns and Images. Rio de Janeiro, Brazil: IEEE: 141-148[DOI:10.1109/SIBGRAPI.2014.28http://dx.doi.org/10.1109/SIBGRAPI.2014.28]
Viola P and Jones M. 2001. Rapid object detection using a boosted cascade of simple features//Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Kauai, USA: IEEE: 511-518[DOI:10.1109/CVPR.2001.990517http://dx.doi.org/10.1109/CVPR.2001.990517]
Viola P and Jones M J. 2004. Robust real-time face detection. International Journal of Computer Vision, 57(2): 137-154[DOI: 10.1023/B:VISI.0000013087.49260.fb]
Xu D Q and Wu Y Q. 2020. Improved YOLO-V3 with DenseNet for multi-scale remote sensing target detection. Sensors, 20(15): #4276[DOI: 10.3390/s20154276]
Yun S, Han D, Chun S, Oh S J, Yoo Y and Choe J. 2019. CutMix: regularization strategy to train strong classifiers with localizable features//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 6022-6031[DOI:10.1109/ICCV.2019.00612http://dx.doi.org/10.1109/ICCV.2019.00612]
Zeng J X, Fang Q, Fu X and Leng L. 2019. Multi-scale pedestrian detection algorithm with multi-layer features. Journal of Image and Graphics, 24(10): 1683-1691
曾接贤, 方琦, 符祥, 冷璐. 2019. 融合多层特征的多尺度行人检测. 中国图象图形学报, 24(10): 1683-1691)[DOI: 10.11834/jig.190009]
Zhang S F, Wen L Y, Bian X,Lei Z and Li S Z. 2018. Single-shot refinement neural network for object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4203-4212[DOI:10.1109/CVPR.2018.00442http://dx.doi.org/10.1109/CVPR.2018.00442]
相关文章
相关作者
相关机构