Small-scale pedestrian detection based on improved R-FCN model

Wanjun Liu; Libing Dong; Haicheng Qu

doi:10.11834/jig.200287

Image Analysis and Recognition | Views : 0 下载量: 0 CSCD: 3

PDF
Export
Share
Collection
Album

Small-scale pedestrian detection based on improved R-FCN model
Vol. 26, Issue 10, Pages: 2400-2410(2021)
Published： 16 October 2021 ，

Accepted： 02 September 2020
DOI： 10.11834/jig.200287
稿件说明：

移动端阅览

Wanjun Liu, Libing Dong, Haicheng Qu. Small-scale pedestrian detection based on improved R-FCN model. [J]. Journal of Image and Graphics 26(10):2400-2410(2021)
DOI：

Wanjun Liu, Libing Dong, Haicheng Qu. Small-scale pedestrian detection based on improved R-FCN model. [J]. Journal of Image and Graphics 26(10):2400-2410(2021) DOI： 10.11834/jig.200287.

摘要

目的

为了有效解决传统行人检测算法在分辨率低、行人尺寸较小等情境下检测精度低的问题，将基于区域全卷积网络（region-based fully convolutional networks，R-FCN）的目标检测算法引入到行人检测中，提出一种改进R-FCN模型的小尺度行人检测算法。

方法

为了使特征提取更加准确，在ResNet-101的conv5阶段中嵌入可变形卷积层，扩大特征图的感受野；为提高小尺寸行人检测精度，在ResNet-101中增加另一条检测路径，对不同尺寸大小的特征图进行感兴趣区域池化；为解决小尺寸行人检测中的误检问题，利用自举策略的非极大值抑制算法代替传统的非极大值抑制算法。

结果

在基准数据集Caltech上进行评估，实验表明，改进的R-FCN算法与具有代表性的单阶段检测器（single shot multiBox detector，SSD）算法和两阶段检测器中的Faster R-CNN（region convolutional neural network）算法相比，检测精度分别提高了3.29%和2.78%；在相同ResNet-101基础网络下，检测精度比原始R-FCN算法提高了12.10%。

结论

本文提出的改进R-FCN模型，使小尺寸行人检测精度更加准确。相比原始模型，改进的R-FCN模型对行人检测的精确率和召回率有更好的平衡能力，在保证精确率的同时，具有更大的召回率。

Abstract

Objective

Pedestrian detection is a research hotspot in the field of image processing and computer vision

and it is widely used in fields such as automatic driving

intelligent monitoring

and intelligent robots. The traditional pedestrian detection method based on background modeling and machine learning can obtain a better pedestrian detection rate under certain conditions

but it cannot meet the requirements of practical applications. As deep convolutional neural networks have made great progress in general object detection

more and more scholars have improved the general object detection framework and introduced it to pedestrian detection. Compared with traditional methods

the accuracy and robustness of pedestrian detection based on deep learning methods have been improved significantly

and many breakthroughs have been made. However

the detection effect for small-scale pedestrians is not ideal. This is mainly due to a series of convolution pool operations of the convolutional neural network

which makes the feature map of small-scale pedestrians smaller

have a lower resolution

and lose serious information

leading to detection failure. To effectively solve the problem of low detection accuracy of traditional pedestrian detection algorithms in the context of low resolution and small pedestrian size

an object detection algorithm called region-based fully convolutional network (R-FCN) is introduced into pedestrian detection. This study proposes an improved small-scale pedestrian detection algorithm for R-FCN.

Method

The method in this study inherits the advantage of R-FCN

which employs the region proposal network to generate candidate regions of interest and position-sensitive score maps to classify and locate targets. At the same time

because the new residual network (ResNet-101) has less calculation

few parameters

and good accuracy

this study uses the ResNet-101 network as the basic network. Compared with the original R-FCN

this study mainly has the following improvements: Considering that the pedestrians in the Caltech dataset have multiple scale transformations

all 3×3 conventional convolutional layers of the Conv5 stage of ResNet-101 are first expanded into deformable convolutional layers. Therefore

the effective step size of the convolution block can be reduced from 32 pixels to 16 pixels

the expansion rate can be changed from 1 to 2

the pad is set to 2

and the step size is 1. Deformable convolution can increase the generalization ability of the model

expand the receptive field of the feature map

and improve the accuracy of R-FCN feature extraction. Then

another position-sensitive score map is added in the training phase. Because the feature distinguishing ability of Conv1-3 stages in ResNet-101 is weaker than that of Conv4 stage

a new layer of position-sensitive score map is added after the Conv4 layer to detect multi-scale pedestrians in parallel with the original position-sensitive score map after the Conv5 layer. Finally

the non-maximum suppression (NMS) method often leads to missed detection of neighbor pedestrians in crowed scenes. Therefore

this study improves the traditional NMS algorithm and proposes the NMS algorithm for bootstrap strategy to solve the problem of pedestrian misdetection.

Result

The experiment is evaluation on the benchmark dataset Caltech. The experimental results show that the improved R-FCN algorithm improves the detection accuracy by 3.29% and 2.78% compared with the representative single shot multiBox detector (SSD) algorithm of the single-stage detector and the faster region convolutional neural netowrk(Faster R-CNN) algorithm of the two-stage detector

respectively. Under the same ResNet-101 basic network

the detection accuracy is 12.10% higher than the original R-FCN algorithm. Online hard example mining (OHEM) is necessary for Caltech

which has achieved a 7.38% improvement because the Caltech dataset contains a large number of confounding instances in complex backgrounds

allowing the full use of OHEM. In the Conv5 stage of the ResNet-101 network

a deformable convolutional layer is used

which is 0.89% higher than the ordinary convolutional layer. Using the multi-path detection structure can increase the detection accuracy by 2.50%. The bootstrap strategy is used to correct the non-maximum suppression

which is 1.67% better than the traditional NMS algorithm.

Conclusion

The improved R-FCN model proposed in this study makes the detection accuracy of small-sized pedestrians more accurate and improves the phenomenon of pedestrian false detection in the case of low resolution. Compared with the original R-FCN model

the improved R-FCN model has a better ability to balance the accuracy rate and recall rate of pedestrian detection and has a greater recall rate when ensuring the accuracy rate. However

the accuracy of pedestrian detection in complex scenes is slightly low. Thus

future research will focus on improving the accuracy of pedestrian detection in complex scenes.

关键词

行人检测区域全卷积网络(R-FCN)可变形卷积多路径非极大值抑制(NMS)Caltech数据集

Keywords

pedestrian detectionregion-based fully convolutional network(R-FCN)deformable convolutionmultipathnon-maximum suppression(NMS)Caltech dataset

references

Bodla N, Singh B, Chellappa R and Davis L S. 2017. Soft-NMS-improving object detection with one line of code//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5561-5569[DOI: 10.1109/iccv.2017.593http://dx.doi.org/10.1109/iccv.2017.593]

Dai J F, Li Y, He K M and Sun J. 2016. R-FCN: object detection via region-based fully convolutional networks//Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook, USA: ACM: 379-387

Dai J F, Qi H Z, Xiong Y W, Li Y, Zhang G D, Hu H and Wei Y C. 2017. Deformable convolutional networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 764-773[DOI: 10.1109/ICCV.2017.89http://dx.doi.org/10.1109/ICCV.2017.89]

Dollár P, Wojek C, Schiele B and Perona P. 2009. Pedestrian detection: a benchmark//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 304-311[DOI: 10.1109/CVPR.2009.5206631http://dx.doi.org/10.1109/CVPR.2009.5206631]

Everingham M, Van Gool L, Williams C K I, Winn J and Zisserman A. 2010. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2): 303-338[DOI:10.1007/s11263-009-0275-4]

Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and Semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 580-587[DOI: 10.1109/cvpr.2014.81http://dx.doi.org/10.1109/cvpr.2014.81]

Girshick R. 2015. Fast R-CNN//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1440-1448[DOI: 10.1109/ICCV.2015.169http://dx.doi.org/10.1109/ICCV.2015.169]

He J J, Zhang Y P, Yao T Z, Liu K and Xiao J J. 2018. Robust pedestrian detection based on parallel channel cascade network. Pattern Recognition and Artificial Intelligence, 31(12): 1134-1142

何姣姣, 张永平, 姚拓中, 刘肯, 肖江剑. 2018. 基于并行通道级联网络的鲁棒行人检测. 模式识别与人工智能, 31(12): 1134-1142[DOI:10.16451/j.cnki.issn1003-6059.201812009]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

Krizhevsky A, Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90[DOI:10.1145/3065386]

Kuang P, Ma T S, Li F and Chen Z W. 2018. Real-time pedestrian detection using convolutional neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 32(11): #1856014[DOI:10.1142/s0218001418560141]

Liu S A, Lyu S, Zhang H L and Gong J. 2019a. Pedestrian detection algorithm based on the improved SSD//Proceedings of 2019 Chinese Control and Decision Conference. Nanchang, China: IEEE: 3559-3563[DOI: 10.1109/CCDC.2019.8832518http://dx.doi.org/10.1109/CCDC.2019.8832518]

Liu S T, Huang D and Wang Y H. 2019b. Adaptive NMS: refining pedestrian detection in a crowd//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 6459-6468[DOI: 10.1109/cvpr.2019.00662http://dx.doi.org/10.1109/cvpr.2019.00662]

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot MultiBox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 21-37[DOI: 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2]

Mao J Y, Xiao T T, Jiang Y N and Cao Z M. 2017. What can help pedestrian detection?//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3127-3136[DOI: 10.1109/CVPR.2017.639http://dx.doi.org/10.1109/CVPR.2017.639]

Molchanov V V, Vishnyakov B V, Vizilter Y V, Vishnyakova O V and Knyaz V A. 2017. Pedestrian detection in video surveillance using fully convolutional YOLO neural network//Proceedings of SPIE 10334, Automated Visual Inspection and Machine Vision II. Munich, Germany: SPIE: 103340Q[DOI: 10.1117/12.2270326http://dx.doi.org/10.1117/12.2270326]

Neubeck A and van Gool L. 2006. Efficient non-maximum suppression//Proceedings of the 18th International Conference on Pattern Recognition. Hong Kong, China: IEEE: 850-855[DOI: 10.1109/ICPR.2006.479http://dx.doi.org/10.1109/ICPR.2006.479]

Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 779-788[DOI: 10.1109/cvpr.2016.91http://dx.doi.org/10.1109/cvpr.2016.91]

Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149[DOI:10.1109/TPAMI.2016.2577031]

Shrivastava A, Gupta A and Girshick R. 2016. Training region-based object detectors with online hard example mining//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 761-769[DOI: 10.1109/cvpr.2016.89http://dx.doi.org/10.1109/cvpr.2016.89]

Tesema F B, Lin J P, Ou J, Wu H and Zhu W L. 2018. Feature fusing of feature pyramid network for multi-scale pedestrian detection//Proceedings of the 15th International Computer Conference on Wavelet Active Media Technology and Information Processing. Chengdu, China: IEEE: 10-13[DOI: 10.1109/ICCWAMTIP.2018.8632614http://dx.doi.org/10.1109/ICCWAMTIP.2018.8632614]

Tian Y L, Luo P, Wang X G and Tang X O. 2015. Deep learning strong parts for pedestrian detection//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1904-1912[DOI: 10.1109/iccv.2015.221http://dx.doi.org/10.1109/iccv.2015.221]

Wang B, Tang S, Zhao R Z, Liu W and Cen Y G. 2015. Pedestrian detection based on region proposal fusion//Proceedings of the 17th IEEE International Workshop on Multimedia Signal Processing. Xiamen, China: IEEE: 1-6[DOI: 10.1109/mmsp.2015.7340847http://dx.doi.org/10.1109/mmsp.2015.7340847]

Xiang X Z, Lyu N, Guo X L, Wang S and El Saddik A. 2018. Engineering vehicles detection based on modified faster R-CNN for power grid surveillance. Sensors, 18(7): #2258[DOI:10.3390/s18072258]

Yang P Y, Zhang G F, Wang L, Xu L S, Deng Q X and Yang M H. 2020. A part-aware multi-scale fully convolutional network for pedestrian detection. IEEE Transactions on Intelligent Transportation Systems, PP(99): 1-13[DOI: 10.1109/TITS.2019.2963700http://dx.doi.org/10.1109/TITS.2019.2963700]

Zhang L L, Lin L, Liang X D and He K M. 2016. Is faster R-CNN doing well for pedestrian detection?//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 443-457[DOI: 10.1007/978-3-319-46475-6_28http://dx.doi.org/10.1007/978-3-319-46475-6_28]

Zhao X T, Li W, Zhang Y F, Gulliver T A, Chang S and Feng Z Y. 2016. A Faster RCNN-based pedestrian detection system//Proceedings of the 84th IEEE Vehicular Technology Conference. Montreal, Canada: IEEE: 1-5[DOI: 10.1109/VTCFall.2016.7880852http://dx.doi.org/10.1109/VTCFall.2016.7880852]

Zhu X Z, Hu H, Lin S and Dai J F. 2019. Deformable ConvNets v2: more deformable, better results//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 9308-9316[DOI: 10.1109/cvpr.2019.00953http://dx.doi.org/10.1109/cvpr.2019.00953]

Alert me when the article has been cited

提交

Visible-infrared cross-modal pedestrian detection： a summary

Deformable atrous convolution nearshore SAR small ship detection incorporating mixed attention

Super-resolution video frame reconstruction through lightweight attention constraint alignment network

An overview of deep learning based pedestrian detection algorithms

Roadside pedestrian detection and location based on binocular machine vision and RetinaNet