特征重排列注意力机制的双池化残差分类网络
Double-pooling residual classification network based on feature reordering attention mechanism
- 2025年30卷第1期 页码:110-129
纸质出版日期: 2025-01-16
DOI: 10.11834/jig.240061
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2025-01-16 ,
移动端阅览
袁姮, 刘杰, 姜文涛, 刘万军. 特征重排列注意力机制的双池化残差分类网络[J]. 中国图象图形学报, 2025,30(1):110-129.
YUAN HENG, LIU JIE, JIANG WENTAO, LIU WANJUN. Double-pooling residual classification network based on feature reordering attention mechanism. [J]. Journal of image and graphics, 2025, 30(1): 110-129.
目的
2
针对残差图像分类网络中通道信息交互不充分导致通道特征没有得到有效利用,同时残差结构使部分特征信息丢失的问题,本文提出特征重排列注意力机制的双池化残差分类网络(double-pooling residual classification network of feature reordering attention mechanism,FDPRNet)。
方法
2
FDPRNet以ResNet-34 (residual network)残差网络为基础,首先将第1层卷积核尺寸从
<math id="M1"><mn mathvariant="normal">7</mn><mo>×</mo><mn mathvariant="normal">7</mn></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=72413435&type=
2.28600001
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=72413456&type=
7.11199999
替换为
<math id="M2"><mn mathvariant="normal">3</mn><mo>×</mo><mn mathvariant="normal">3</mn></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=72413440&type=
2.28600001
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=72413442&type=
7.11199999
,保留更多的特征信息,增强网络非线性表达能力,同时删除最大池化层,提高局部细节捕捉能力;然后,提出特征重排列注意力机制(feature reordering attention mechanism,FRAM)模块,将特征图通道进行分组,对其进行组间和组内重排序,通过一维卷积提取各通道组合的特征并拼接,得到重排列特征的权重;最后,提出了双池化残差(double-pooling residual,DPR)模块,该模块使用最大池化和平均池化并行操作特征图,并对池化后的特征图进行逐元素相加和卷积映射,以提取关键特征,减小特征图尺寸。
结果
2
在CIFAR-100(Canadian Institute for Advanced Research)、CIFAR-10和SVHN(street view house numbers)数据集上与其他11种图像分类网络进行比较,相比于性能第2的模型RTSA Net-101(residual Net-101 with tensor-synthetic attention),准确率分别提高了1.16%、1.01%和0.98%。实验结果表明FDPRNet显著提升了分类准确率。
结论
2
本文提出的FDPRNet具有增强图像通道内部信息交流和减少特征丢失的能力,不仅在分类精度上表现出较高水平,而且显著提升了模型的泛化能力。
Objective
2
A residual classification network is a deep convolutional neural network architecture that plays an important role and has a considerable influence in the field of deep learning. It has become one of the commonly used network structures in various image classification tasks in the field of computer vision. To solve the problem of network degradation in deep networks, unlike the traditional method of simply stacking convolutional layers, residual networks innovatively introduce residual connections, which directly add input features to output features through skip connections, and pass the original features directly to subsequent network layers. It forms a shortcut path, thereby preserving and utilizing feature information better. Although residual classification network effectively solves the problems of gradient explosion and vanishing during deep network training, when the output dimension of the residual block does not match the input dimension, convolution maps are needed to ensure the same dimensions, which causes a large number of pixels on the channel matrix in the residual module to be skipped, resulting in the problem of feature information loss. In addition, correlation exists between image channels, and a fixed order of channels may lead to feature bias, making it difficult to fully utilize information from other channels and limiting the model’s ability to express key features. In response to the above issues, this article proposes a double pooling residual classification network of feature reordering attention mechanism (FDPRNet).
Method
2
FDPRNet is based on the ResNet-34 residual network. First, the kernel size of the first convolutional layer is changed from 7 × 7 to 3 × 3. This change is made because, for relatively small images, larger convolutional kernels can cause the receptive field to become larger, capturing too much useless contextual information. Time, the maximum pooling layer is removed to prevent the feature map from shrinking further, retaining more image information, avoiding information loss caused by pooling operations, and making it easier for subsequent network layers to extract features better. Then, a feature reordering attention module (FRAM) is proposed to group the feature map channels and perform inter-group and intra-group reordering so that adjacent channels are no longer connected, and the intra-group channels are grouped in a sequence of equal differences with a step size of 1. This operation can not only disrupt the order of some original channels before and after but also preserve the relationship between some channels before and after, introducing a certain degree of randomness, allowing the model to comprehensively consider the interaction between different channels, and avoiding excessive dependence on specific channels. The features of each channel combination are extracted and spliced by one-dimensional convolution, and then the sigmoid activation function is used to obtain the weights of the rearranged features, which are multiplied element by element with the input features to obtain the feature map of the feature rearranged attention mechanism. Finally, a double pooling residual (DPR) module is proposed, which uses both maximum pooling and average pooling to perform parallel operations on feature maps. This module obtains both salient and typical features of the input images, enhancing the expressive power of features and helping the network capture important information better in the images, thereby improving model performance. Element-by-element summation and convolutional mapping on the after-pooling feature maps are performed to extract key features, reduce the size of the feature maps, and ensure that the channel matrices are capable of element-level summation operations in residual concatenation.
Result
2
In the CIFAR-100, CIFAR-10, SVHN, Flowers-102, and NWPU-RESISC45 datasets, compared with the original model ResNet-34, the accuracy of ResNet-34 with the addition of FRAM is improved by 1.66%, 0.19%, 0.13%, 4.28%, and 2.00%, respectively. The accuracy of ResNet-34 with the addition of DPR is improved by 1.7%, 0.26%, 0.12%, 3.18%, and 1.31%, respectively. The accuracy of FDPRNet, which is the combination of the FRAM and DPR modules, is improved by 2.07%, 0.3%, 0.17%, 8.31%, and 2.47%, respectively. Compared with four attention mechanisms—squeeze and excitation, efficient channel attention, coordinate attention, and convolutional block attention module, the accuracy of FRAM is improved by an average of 0.72%, 1.28%, and 1.46% in the CIFAR-100, Flowers-102, and STL-10 datasets. In summary, whether on small or large, less categorized, or more categorized datasets, both the FRAM and DPR modules contribute to the improvement of recognition accuracy in the ResNet-34 network. The combination of the two modules—FDPR—has the best effect on improving the recognition rate of the network and achieves a significant improvement in accuracy compared with other image classification networks.
Conclusion
2
The proposed FDPRNet can enhance the information exchange within the image channel and reduce feature loss. It not only shows high classification accuracy but also effectively enhances the network’s feature learning ability and model generalization ability. The main contributions of this article are as follows: 1) FRAM is proposed, which breaks the connections between the original channels and groups them according to certain rules. Learning the weights of channel combinations in different orders ensures that the channels between different groups interact without losing the front and back connections between all channels, achieving information exchange and channel crossing within the feature map, enhancing the interaction between features, better capturing the correlation between contextual information and features, and improving the accuracy of model classification. 2) DPR is proposed, which replaces the skip connections in the original residual block with a DPR module, solving the problem of feature information loss caused by a large number of pixel points being skipped in the channel matrix during the skip connections in the residual module. Using dual pooling to obtain salient and typical features of input images can not only enhance the expression ability of features but also help the network better capture important information in images and improve model classification performance. 3) The proposed FDPRNet inserts two modules—FRAM and DPR—into the residual network to enhance network channel interaction and feature expression capabilities, enabling the network model to capture complex relationships and strong generalization ability. It achieved high classification accuracy on some mainstream image classification datasets.
图像分类特征重排列注意力机制残差网络深度学习
image classificationfeature rearrangementattention mechanismresidual networkdeep learning
Dwibedi D, Aytar Y, Tompson J, Sermanet P and Zisserman A. 2021. With a little help from my friends: nearest-neighbor contrastive learning of visual representations [EB/OL]. [2024-02-04]. http://arxiv.org/pdf/2104.14548.pdfhttp://arxiv.org/pdf/2104.14548.pdf
Han K, Wang Y H, Tian Q, Guo J Y, Xu C J and Xu C. 2020. GhostNet: more features from cheap operations//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 1577-1586 [DOI: 10.1109/cvpr42600.2020.00165http://dx.doi.org/10.1109/cvpr42600.2020.00165]
Hassani A, Walton S, Shah N, Abuduweili A, Li J C and Shi H. 2022. Escaping the big data paradigm with compact transformers [EB/OL]. [2024-02-04]. http://arxiv.org/pdf/2104.05704.pdfhttp://arxiv.org/pdf/2104.05704.pdf
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hou Q B, Zhou D Q and Feng J S. 2021. Coordinate attention for efficient mobile network design//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 13708-13717 [DOI: 10.1109/CVPR46437.2021.01350http://dx.doi.org/10.1109/CVPR46437.2021.01350]
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141 [DOI: 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745]
Huang Z L, Wang X G, Huang L C, Huang C, Wei Y C and Liu W Y. 2019. CCNet: criss-cross attention for semantic segmentation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 603-612 [DOI: 10.1109/iccv.2019.00069http://dx.doi.org/10.1109/iccv.2019.00069]
Jiang W T, Zhao L L and Tu C. 2023. Double-branch multi-attention mechanism based sharpness-aware classification network. Pattern Recognition and Artificial Intelligence, 36(3): 252-267
姜文涛, 赵琳琳, 涂潮. 2023. 双分支多注意力机制的锐度感知分类网络. 模式识别与人工智能, 36(3): 252-267 [DOI: 10.16451/j.cnki.issn1003-6059.202303005http://dx.doi.org/10.16451/j.cnki.issn1003-6059.202303005]
Konstantinidis D, Papastratis I, Dimitropoulos K and Daras P. 2023. Multi-manifold attention for vision transformers [EB/OL]. [2024-02-04]. http://arxiv.org/pdf/2207.08569.pdfhttp://arxiv.org/pdf/2207.08569.pdf
Krizhevsky A, Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90 [DOI: 10.1145/3065386http://dx.doi.org/10.1145/3065386]
Lan H, Wang X H and Wei X. 2021. Couplformer: rethinking vision transformer with coupling attention map [EB/OL]. [2024-02-04]. http://arxiv.org/pdf/2112.05425.pdfhttp://arxiv.org/pdf/2112.05425.pdf
Lecun Y, Bottou L, Bengio Y and Haffner P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278-2324 [DOI: 10.1109/5.726791http://dx.doi.org/10.1109/5.726791]
Müller S G and Hutter F. 2021. TrivialAugment: tuning-free yet state-of-the-art data augmentation [EB/OL]. [2024-02-04]. http://arxiv.org/pdf/2103.10158.pdfhttp://arxiv.org/pdf/2103.10158.pdf
Qi G J. 2020. Loss-sensitive generative adversarial networks on lipschitz densities. International Journal of Computer Vision, 128(5): 1118-1140 [DOI: 10.1007/s11263-019-01265-2http://dx.doi.org/10.1007/s11263-019-01265-2]
Qiu Y F, Zhang J X, Lan H and Zong J X. 2023. Improved resnet image classification model based on tensor synthesis attention. Laser and Optoelectronics Progress, 60(6): #0610008
邱云飞, 张家欣, 兰海, 宗佳旭. 2023. 融合张量合成注意力的改进ResNet图像分类模型. 激光与光电子学进展, 60(6): #0610008 [DOI: 10.3788/LOP212836http://dx.doi.org/10.3788/LOP212836]
Sandler M, Howard A, Zhu M L, Zhmoginov A and Chen L C. 2018. MobileNetV2: inverted residuals and linear bottlenecks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4510-4520 [DOI: 10.1109/CVPR.2018.00474http://dx.doi.org/10.1109/CVPR.2018.00474]
Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D and Batra D. 2017. Grad-CAM: visual explanations from deep networks via gradient-based localization//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 618-626 [DOI: 10.1109/ICCV.2017.74http://dx.doi.org/10.1109/ICCV.2017.74]
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2024-02-04]. http://arxiv.org/pdf/1409.1556.pdfhttp://arxiv.org/pdf/1409.1556.pdf
Song W, Cai W Y, He S Q and Li W J. 2021. Dynamic graph convolution with spatial attention for point cloud classification and segmentation. Journal of Image and Graphics, 26(11): 2691-2702
宋巍, 蔡万源, 何盛琪, 李文俊. 2021. 结合动态图卷积和空间注意力的点云分类与分割. 中国图象图形学报, 26(11): 2691-2702 [DOI: 10.11834/jig.200550http://dx.doi.org/10.11834/jig.200550]
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 1-9 [DOI: 10.1109/CVPR.2015.7298594http://dx.doi.org/10.1109/CVPR.2015.7298594]
Tan M X and Le Q V. 2020. EfficientNet: rethinking model scaling for convolutional neural networks [EB/OL]. [2024-02-04]. http://arxiv.org/pdf/1905.11946.pdfhttp://arxiv.org/pdf/1905.11946.pdf
Wang Q L, Wu B G, Zhu P F, Li P H, Zuo W M and Hu Q H. 2020. ECA-Net: efficient channel attention for deep convolutional neural networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11531-11539 [DOI: 10.1109/CVPR42600.2020.01155http://dx.doi.org/10.1109/CVPR42600.2020.01155]
Wang X L, Girshick R, Gupta A and He K M. 2018. Non-local neural networks [EB/OL]. [2024-02-04]. http://arxiv.org/pdf/1711.07971.pdfhttp://arxiv.org/pdf/1711.07971.pdf
Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module [EB/OL]. [2024-02-04]. http://arxiv.org/pdf/1807.06521.pdfhttp://arxiv.org/pdf/1807.06521.pdf
Wu B, Liu Y A and Zhao J. 2024. Classification network for 3D point cloud based on spatial structure convolution and attention mechanism. Journal of Image and Graphics, 29(2): 520-532
武斌, 刘溢安, 赵洁. 2024. 结合空间结构卷积和注意力机制的三维点云分类网络. 中国图象图形学报, 29(2): 520-532 [DOI: 10.11834/jig.230137http://dx.doi.org/10.11834/jig.230137]
Xie S N, Girshick R, Dollár P, Tu Z W and He K M. 2017. Aggregated residual transformations for deep neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 5987-5995 [DOI: 10.1109/CVPR.2017.634http://dx.doi.org/10.1109/CVPR.2017.634]
Zagoruyko S and Komodakis N. 2017. Wide residual networks [EB/OL]. [2024-02-04]. http://arxiv.org/pdf/1605.07146.pdfhttp://arxiv.org/pdf/1605.07146.pdf
Zhang H Y, Cisse M, Dauphin Y N and Lopez-Paz D. 2018. mixup: beyond empirical risk minimization [EB/OL]. [2024-02-04]. http://arxiv.org/pdf/1710.09412.pdfhttp://arxiv.org/pdf/1710.09412.pdf
Zhang K, Guo Y R, Wang X S, Yuan J S, Ma Z Y and Zhao Z B. 2019. Channel-wise and feature-points reweights densenet for image classification//Proceedings of 2019 IEEE International Conference on Image Processing (ICIP). Taipei, China: IEEE: 410-414 [DOI: 10.1109/ICIP.2019.8802982http://dx.doi.org/10.1109/ICIP.2019.8802982]
Zhang X Y, Zhou X Y, Lin M X and Sun J. 2017. ShuffleNet: an extremely efficient convolutional neural network for mobile devices [EB/OL]. [2024-02-04]. http://arxiv.org/pdf/1707.01083.pdfhttp://arxiv.org/pdf/1707.01083.pdf
Zhou B L, Khosla A, Lapedriza A, Oliva A and Torralba A. 2016. Learning deep features for discriminative localization//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2921-2929 [DOI: 10.1109/CVPR.2016.319http://dx.doi.org/10.1109/CVPR.2016.319]
相关作者
相关机构