随机空洞卷积的图像分类网络
Image classification network with random dilated convolution
- 2025年 页码:1-20
网络出版日期: 2025-01-23 ,
录用日期: 2025-01-16
DOI: 10.11834/jig.240746
移动端阅览
浏览全部资源
扫码关注微信
网络出版日期: 2025-01-23 ,
录用日期: 2025-01-16
移动端阅览
姜文涛, 由卓丞, 袁姮. 随机空洞卷积的图像分类网络[J/OL]. 中国图象图形学报, 2025,1-20.
JIANG WENTAO, YOU ZHUOCHENG, YUAN HENG. Image classification network with random dilated convolution. [J/OL]. Journal of image and graphics, 2025, 1-20.
目的
2
针对图像分类任务中对于细粒度特征提取困难,同时背景噪声和不相关区域影响网络对目标特征学习的问题,本文提出随机空洞卷积的图像分类网络(image classification network with random dilated convolution,RDCNet)。
方法
2
RDCNet网络以ResNet-34为基线网络。首先,提出多分支随机空洞卷积(multi-branch random dilated convolution,MRDC)模块,通过多个分支的卷积操作和随机膨胀卷积核的设计,实现了从不同尺度和感受野上对细粒度特征的有效捕捉。通过引入细粒度特征增强(fine-grained feature enhancement,FGFE)模块,实现对全局信息的学习和局部特征的增强,提升了网络局部特征提取和全局上下文理解能力。同时引入随机掩码机制动态地遮蔽部分输入特征和卷积核权重,不仅可以通过多样化的特征组合来学习更加健壮和鲁棒性的表示,还能够有效减少过拟合,提升对噪声和不相关区域的适应能力。最后,提出上下文激励(context excitation,CE)模块,通过引入上下文信息并动态调整特征通道的权重,增强网络对关键特征的关注能力,抑制背景噪声的干扰,提升了特征的表达能力。
结果
2
本文方法在CIFAR-10、CIFAR100、SVHN、Imagenette、Imagewoof数据集上均有良好的分类准确率,相比于性能第2的模型,分类准确率分别提高了0.02%、1.12%、0.32%、4.73%、3.56%,实验结果证明RDCNet具有较高的分类性能。
结论
2
随机空洞卷积的图像分类网络具有更强的细粒度特征敏感度,能够在多尺度和上下文中提取丰富的特征信息,较好地关注关键特征,对复杂背景下目标具有更优秀的辨识能力,从而在分类任务中表现出优异的分类性能。
Objective
2
Image classification tasks are particularly common in modern computer vision tasks, but with the continuous development of deep learning methods, how to effectively extract fine-grained features, suppress noise interference, and deal with target features in complex backgrounds is still a challenging problem. In particular, in residual networks, although they have strong feature learning capabilities, it is often difficult to efficiently learn fine-grained features due to the diversity of training data, background noise, and differences between target objects at different scales. Conventional convolutional operations often ignore detailed information at different scales when dealing with such complex problems, and are susceptible to overfitting when facing noisy or irrelevant regions, leading to performance degradation. To address these challenges, this paper proposes Image Classification Network with Random Dilated Convolution (RDCNet).The network aims to solve the problems of difficulty in fine-grained feature extraction, background noise interference, and overfitting. Through a series of innovative designs, key features are better extracted from complex backgrounds and the classification ability of the network is improved.
Method
2
RDCNet is based on the classical ResNet-34 as the backbone network, which utilizes its powerful residual connectivity property to enhance the training depth and stability of the network. In this paper, several innovative modules are proposed on this basis to enhance the feature extraction capability of the network. First, the Multi-branch Random Dilated Convolution (MRDC) module is proposed, which realizes the effective capture of fine-grained features from different scales and sensory fields by the convolution operation of multiple branches and the design of randomly inflated convolution kernel. Compared to the traditional convolution operation, the expansion convolution can expand the receptive field without increasing the computational effort, thus better capturing the multi-scale information in the image. The design of the randomized expansion rate ensures that the network can adapt to different target scales and variations by diversifying the convolution kernel structure, which further enhances the ability of fine-grained feature extraction.To enhance the representation of local features, the Fine-Grained Feature Enhancement (FGFE) module is introduced to enhance the network's sensitivity to small objects and detailed features by fusing global information with local features. The module extracts global features of an image using a global average pooling operation and models them jointly with local features to help the network better understand contextual information. With this feature enhancement mechanism, the network is able to classify small differences more accurately, improving the ability to recognize tiny targets.At the same time, the introduction of random masking mechanism dynamically masks part of the input features and convolutional kernel weights, which not only can learn more robust and robust representations through diverse feature combinations, but also effectively reduces overfitting and improves the ability to adapt to noise and unknown inputs. This mechanism is similar to the Dropout technique, but the difference is that the random masking mechanism is based on the dynamic masking of features and convolutional kernels, which is more flexible to cope with various complex input data.Finally, the Context Excitation (CE) module is proposed to enhance the network's ability to focus on key features by introducing contextual information and dynamically adjusting the weights of feature channels. In image classification tasks, features in certain regions are often critical to the final classification results, while background noise may negatively affect the results. This module helps the network focus on those critical features that contribute to the classification task by adaptively adjusting the importance of each feature channel, while suppressing interference from irrelevant regions.
Result
2
In order to verify the effectiveness of RDCNet in image classification tasks, this paper conducts experiments on CIFAR-10, CIFAR100, SVHN, Imagenette, and Imagewoof datasets, and the experimental results show that RDCNet exhibits a significant performance improvement on all the datasets.RDCNet compares with the model with the 2nd highest performance in the CIFAR-10 dataset by 0.02%, proving its superiority in fine-grained feature extraction; the classification accuracy of RDCNet on the CIFAR-100 dataset by 1.12%, demonstrating its superiority in dealing with large-scale classification tasks; and the classification accuracy of RDCNet on the SVHN dataset by 0.32%, showing its superiority in dealing with the Street View Digit Recognition problem; the classification accuracy improved by 4.73% in the Imagenette dataset verifies its ability to recognize target features in more complex backgrounds; and the classification accuracy improved by 3.56% in the Imagewoof dataset further proves the network's excellent performance in different scenarios.In addition, in order to deeply analyze the roles of each innovative module of RDCNet, this paper also conducts ablation experiments to further demonstrate the unique roles of each module in RDCNet by gradually removing key modules from the network and evaluating their impact on the overall performance, and that the synergy between the modules enables the network to maintain superior performance when dealing with complex tasks. Through these experiments, this paper demonstrates the superiority of RDCNet on multiple datasets, especially when dealing with complex background, noise and fine-grained features.
Conclusion
2
In this paper, we propose Randomized Cavity Convolutional Image Classification Network (RDCNet) to effectively solve the fine-grained feature extraction, noise interference suppression, and overfitting problems in image classification tasks by introducing an innovative multi-branch randomized cavity convolution module, a fine-grained feature enhancement module, a randomized masking mechanism, and a contextual excitation module. The experimental results show that the classification performance of RDCNet on multiple standard datasets is significantly improved, especially the target recognition ability in complex background is effectively improved.The key contributions of RDCNet are stronger fine-grained feature sensitivity through multi-scale feature extraction, fine-grained feature enhancement and contextual information modeling, the ability to extract rich feature information in multi-scales and contexts, a better focus on key features, and better discrimination of targets in complex contexts, which leads to excellent classification performance in classification tasks.In addition, the introduction of the random masking mechanism effectively enhances the robustness of the network and reduces the risk of overfitting, enabling it to perform more stably in complex real-world scenarios. Future research can further explore the application of this method in other visual tasks and try to combine it with other advanced techniques to further enhance its performance.
图像分类残差网络空洞卷积随机空洞卷积细粒度特征随机掩码机制
image classificationresidual networksdilated convolutionrandom dilated convolutionfine-grained featuresrandom masking mechanism
Abdi M and Nahavandi S. 2017. Multi-residual networks: improving the speed and accuracy of residual networks[EB/OL]. [2017-03-15]. https://arxiv.org/pdf/1609.05672.pdfhttps://arxiv.org/pdf/1609.05672.pdf
Cao Y, Xu J, Lin S, Wei F and Hu H. 2019. GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond //Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop.Seoul, Korea(South): IEEE: 1971-1980 [DOI: 10.1109/ICCVW.2019.00246http://dx.doi.org/10.1109/ICCVW.2019.00246]
Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2016. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis & Machine Intelligence, 40(4):834-848 [DOI:10.1109/TPAMI.2017.2699 184http://dx.doi.org/10.1109/TPAMI.2017.2699184]
Chen L W, Gu L and Fu Y. 2024. Frequency-Adaptive Dilated Convolution for Semantic Segmentation.//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Seattle,WA,USA:IEEE:3414-3425 [DOI:10.1109/CV PR52733.2024.00328http://dx.doi.org/10.1109/CVPR52733.2024.00328]
Choromanski K, Likhosherstov V, Dohan D, Song X Y, Gane A, Sarlos T, Hawkins P, Davis J, Mohiuddin A, Kaiser L, Belanger D, Colwell L and Weller A. 2020. Rethinking attention with performers[EB/OL].[2020-09-30]. https://arxiv.org/pdf/2009.14794.pdfhttps://arxiv.org/pdf/2009.14794.pdf
Han K, Wang Y H, Tian Q, Guo J Y, Xu C J and Xu C. 2020. GhostNet:more features from cheap operations//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,USA:IEEE:1577-1586 [DOI:10.1109/ cvpr42600.2020.00165http://dx.doi.org/10.1109/cvpr42600.2020.00165]
He K M, Zhang X Y, Ren S Q and Sun J. 2016.Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,USA:IEEE:770-778[DOI:10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Howard A G, Zhu M L, Chen B, Kalenichenko D, Wang W J, Weyand T, Andreetto M and Adam H. 2017. MobileNets: efficient convolutional neural networks for mobile vision applications [EB/OL].[2017-04-17]. https://arxiv.org/pdf/1704.04861.pdfhttps://arxiv.org/pdf/1704.04861.pdf
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA:IEEE:7132-7141 [DOI:10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745]
Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.2017: 2261-2269.[DOI: 10.1109/CVPR.2017.243http://dx.doi.org/10.1109/CVPR.2017.243]
Huang G, Zheng Y L, Liao K Y, Lin G F, Cao C J and Song X F. 2023. Mutual attention diversity feature fusion network -relevant fine-grained classification. Journal of Image and Graphics,28(08):2420-2431
黄港,郑元林,廖开阳,蔺广逢,曹从军,宋雪芳. 2023. 互补注意多样性特征融合网络的细粒度分类. 中国图象图形学报,28(08):2420-2431[DOI:10. 11834/jig. 220295http://dx.doi.org/10.11834/jig.220295]
Huang Z L, Wang X G, Huang L C, Huang C, Wei Y C and Liu W Y. 2019. CCNet: Criss cross attention for semantic segmentation //Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 603-612 [DOI:10.1109/iccv.2019.00069http://dx.doi.org/10.1109/iccv.2019.00069]
Jiang W T, Chen C and Zhang S C. 2024. Sparse feature image classification network with spatial position correction. Opto-ElectronEng, 2024, 51(05): 66-82.
姜文涛,陈晨, 张晟翀. 2024. 空间位置矫正的稀疏特征图像分类网络.光电工程, 51(05):66-82[DOI: 10.12086/oee.2024.240050http://dx.doi.org/10.12086/oee.2024.240050]
Jiang W T, Zhao L L and Tu C. 2023. Double-branch multi-attention mechanism based sharpness-aware classification network. Pattern Recognition and Artificial Intelligence, 2023,36 (03):252-267.
姜文涛,赵琳琳, 涂潮. 2023. 双分支多注意力机制的锐度感知分类网络.模式识别与人工智能, 36(03): 252-267[DOI:10.16451/j.cnki.issn1003-6059.202303005http://dx.doi.org/10.16451/j.cnki.issn1003-6059.202303005]
Jiang X Y, Zhang X H, Gao N, Deng Y. 2024. When Fast Fourier Transform Meets Transformer for Image Restoration //Proceedings of the European Conference on Computer Vision(2024). Milan, Italy: Springer: 381-402 [DOI:10.1007/978-3-031-72995-9_22http://dx.doi.org/10.1007/978-3-031-72995-9_22]
Konstantinidis D, Papastratis I, Dimitropoulos K and Daras P. 2023. Multi-manifold attention for vision transformers[EB/OL]. [2023-09-05] http://arxiv.org/pdf/2207.08569.pdfhttp://arxiv.org/pdf/2207.08569.pdf
Krizhevsky A, Sutskever I and Hinton G E. 2012.ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84-90[DOI:10.1145/ 3065386http://dx.doi.org/10.1145/3065386]
Lan H, Wang X and Wei. 2021. Couplformer: Rethinking vision transformer with coupling attention map[EB/OL].[2021-12-10]. http://arxiv.org/pdf/2112.05425.pdfhttp://arxiv.org/pdf/2112.05425.pdf
Lecun Y, Bottou L, Bengio Y and Haffner P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324[DOI:10.1109/5.726791http://dx.doi.org/10.1109/5.726791]
Liu S T, Huang D and Wang Y H. 2018. Receptive field block net for accurate and fast object detection//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer: 385-400 [DOI:10.1007/978-3-030-01252- 6_24http://dx.doi.org/10.1007/978-3-030-01252-6_24]
Luo Z B, Sun Z T, Zhou W L, Wu Z Z and Kamata S. 2022. Rethinking ResNets: improved stacking strategies with high-order schemes for image classification. Complex Intell Syst, 8(4): 3395-3407 [DOI:10.1007/s40747-022-00671-3http://dx.doi.org/10.1007/s40747-022-00671-3]
Simonyan K and Zisserman A . 2015. Very deep convolutional networks for large-scale image recognition[EB/OL].[2015-04- 10]. http://arxiv.org/pdf/1409.1556.pdfhttp://arxiv.org/pdf/1409.1556.pdf
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions.//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Boston, USA:IEEE: 2015: 1-9[DOI:10.1109/CVPR.2015.7298594http://dx.doi.org/10.1109/CVPR.2015.7298594]
Tan M and Le Q V. 2020. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks[EB/OL].[2020-09-11]. http://arxiv.org/pdf/1905.11946.pdfhttp://arxiv.org/pdf/1905.11946.pdf
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Cambridge: MIT Press: 6000-6010 [DOI: 10.48550/arXiv.1706.03762http://dx.doi.org/10.48550/arXiv.1706.03762]
Wang J, Yang Q P, Yang S Q, Chai X and Zhang W J. 2022. Dual-path processing network for high-resolution salient object detection. Applied Intelligence, 52(10): 12034-12048 [DOI: 10.1007/s10 489-021-02971-6http://dx.doi.org/10.1007/s10489-021-02971-6]
Wang X, Girshick R, Gupta A and He K. 2018. Non-local neural networks[EB/OL].[2018-04-13]. http://arxiv.org/pdf/1711.07971.pdfhttp://arxiv.org/pdf/1711.07971.pdf
Wu X D, Gao S Q, Zhang Z Y, Li Z Z, Bao R X and Zhang Y F. 2024. Auto-train-once: controller network guided automatic network pruning from scratch//2024 IEEE/CVF Conference on Comp uter Vision and Pattern Recognition (CVPR). Seattle, WA, USA: IEEE:16163-16173[DOI:10.1109/CVPR52733.2024.01530http://dx.doi.org/10.1109/CVPR52733.2024.01530]
Zagoruyko S and Komodakis N. 2017. Wide Residual Networks [EB/OL].[2017-06-14]. http://arxiv.org/pdf/1605.07146.pdfhttp://arxiv.org/pdf/1605.07146.pdf
Zhong Z, Zheng L, Kang G L, Li S Z and Yang Y. 2017. Random erasing data augmentation//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA:AAAI:13001-13008 [DOI:10.1609/aaai.v34i07.7000http://dx.doi.org/10.1609/aaai.v34i07.7000]
Zhou C L, Zhang H, Zhou Z K, Yu L T, Huang L W, Fan X P, Yuan L, Ma Z Y, Zhou H H and Tian Y H. 2024. QKFormer: hierarchical spiking transformer using Q-K attention[EB/OL]. [2024-10-08]. https://arxiv.org/pdf/2403.16552.pdfhttps://arxiv.org/pdf/2403.16552.pdf
Zhu Z, Xu M D, Bai S, Huang T and Bai X. 2019. Asymmetric non-Local neural networks for semantic segmentation //Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea(South): IEEE: 593-602 [DOI:10.1109/ICCV.2019.00068http://dx.doi.org/10.1109/ICCV.2019.00068]
相关作者
相关机构