细粒度图像分类的自知识蒸馏学习
Self-knowledge distillation for fine-grained image classification
- 2024年29卷第12期 页码:3756-3769
纸质出版日期: 2024-12-16
DOI: 10.11834/jig.230846
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-12-16 ,
移动端阅览
张睿, 陈瑶, 王家宝, 李阳, 张旭. 2024. 细粒度图像分类的自知识蒸馏学习. 中国图象图形学报, 29(12):3756-3769
Zhang Rui, Chen Yao, Wang Jiabao, Li Yang, Zhang Xu. 2024. Self-knowledge distillation for fine-grained image classification. Journal of Image and Graphics, 29(12):3756-3769
目的
2
在无教师模型指导的条件下,自知识蒸馏方法可以让模型从自身学习知识来提升性能,但该类方法在解决细粒度图像分类任务时,因缺乏对图像判别性区域特征的有效提取导致蒸馏效果不理想。为了解决该问题,提出了一种融合高效通道注意力的细粒度图像分类自知识蒸馏学习方法。
方法
2
首先,引入高效通道注意力(efficient channel attention, ECA)模块,设计了ECA残差模块并构建ECA-ResNet18(residual network)轻量级骨干网,用以更好地提取图像判别性区域的多尺度特征;其次,构建了高效通道注意力加权双向特征金字塔ECA-BiFPN(bidirectional feature pyramid network)模块,用以融合不同尺度的特征,构建更加鲁棒的跨尺度特征;最后,提出了一种多级特征知识蒸馏损失,用以跨尺度特征对多尺度特征的蒸馏学习。
结果
2
在Caltech-UCSD Birds 200、Stanford Cars和FGVC-Aircraft 3个公开数据集上,所提方法分别取得了76.04%、91.11%和87.64%的分类精度,与已有15种自知识蒸馏方法中最佳方法的分类精度相比,分别提高了2.63%、1.56%和3.66%。
结论
2
所提方法具有高效提取图像判别性区域特征的能力,能获得更好的细粒度图像分类精度,其轻量化的网络模型适合于面向嵌入式设备的边缘计算应用。
Objective
2
Fine-grained image classification aims to classify a super-category into multiple sub-categories. This task is more challenging than general image classification due to the subtle inter-class differences and large intra-class variations. The attention mechanism enables the model to focus on the key areas of the input image and the discriminative regional features of the image, which are particularly useful for fine-grained image classification tasks. The attention-based classification model also shows high interpretability. To improve the focus of this model on the image discriminative region, attention-based methods have been applied in fine-grained image classification. Although the current attention-based fine-grained image classification models achieve high classification accuracy, they do not adequately consider the number of model parameters and computational volume. As a result, they cannot be easily deployed on low-resource devices, thus greatly limiting their practical application. The concept of knowledge distillation involves transferring knowledge from a high-accuracy, high-parameter, and computationally expensive large teacher model to a low-parameter and computationally efficient small student model to enhance the performance of the latter and to reduce the cost of model learning. To further reduce the model learning cost, researchers have proposed the self-knowledge distillation method that, unlike traditional knowledge distillation methods, enables models to improve their performance by utilizing their own knowledge instead of relying on teacher networks. However, this method falls short in addressing fine-grained image classification tasks due to its ineffective extraction of discriminative region features from images, which results in unsatisfactory distillation outcomes. To address this issue, we propose a self-knowledge distillation learning method for fine-grained image classification by fusing efficient channel attention (ECASKD).
Method
2
The proposed method embeds an efficient channel attention mechanism into the structure of the self-knowledge distillation framework to effectively extract the discriminative regional features of images. The framework mainly consists of a self-knowledge distillation network with a lightweight backbone and a self-teacher subnetwork and a joint loss with classification loss, knowledge distillation loss, and multi-layer feature-based knowledge distillation loss. First, we introduce the efficient channel attention (ECA) module, propose the ECA-Residual block, and construct the ECA-Residual Network18 (ECA-ResNet18) lightweight backbone to improve the extraction of multiscale features in discriminative regions of the input image. Compared with the residual module in the original ResNet18, the ECA-Residual block introduces the ECA module after each batch normalization operation. This module consists of two ECA-Residual blocks to form a stage of the ECA-ResNet18 backbone network, enhance the network’s focus on discriminative regions of the image, and facilitate the extraction of multiscale features. Unlike ResNet18, which is commonly used in self-knowledge distillation methods, the proposed backbone is based on the ECA-Residual module, which can significantly enhance the ability of the model to extract multi-scale features while maintaining lightweight and highly efficient computational performance. Second, considering the differences in the importance of different scale features output from the backbone network, we design and propose the efficient channel attention bidirectional feature pyramid network (ECA-BiFPN) block that assigns weights to the channels during the feature fusion process to differentiate the contribution of features from various channels to the fine-grained image classification task. Finally, we propose a multi-layer feature-based knowledge distillation loss to enhance the backbone network’s learning from the self-teacher subnetwork and to focus on discriminative regions.
Result
2
Our proposed method achieves classification accuracies of 76.04%, 91.11%, and 87.64% on three publicly available datasets, namely, Caltech-UCSD Birds 200 (CUB), Stanford Cars (CAR), and FGVC-Aircraft (AIR). To ensure a comprehensive and objective evaluation, we compared the proposed ECASKD method with 15 other methods, including data-augmentation, auxiliary-network, and attention-based methods. Compared with data-augmentation-based methods, ECASKD improves the accuracy by 3.89%, 1.94%, and 4.69% on CUB, CAR, and AIR, respectively, with respect to the state-of-the-art (SOTA) method. Compared to the auxiliary network-based method, ECASKD improves the accuracy by 6.17%, 4.93%, and 7.81% on CUB, CAR, and AIR, respectively, with respect to SOTA method. Compared to the joint auxiliary network and data augmentation methods, ECASKD improves the accuracy by 2.63%, 1.56%, and 3.66% on CUB, CAR, and AIR, respectively, with respect to SOTA method. In sum, ECASKD demonstrates a better fine-grained image classification performance compared with the joint auxiliary network and data augmentation methods even without data augmentation. Compared with the attention-based self-knowledge distillation method, ECASKD improves about 23.28%, 8.17%, and 14.02% on CUB, CAR and AIR, respectively, with respect to SOTA method. In sum, the ECASKD method outperforms all three types of self-knowledge distillation methods and demonstrates a better fine-grained image classification performance. We also compare this method with four mainstream modeling methods in terms of the number of parameters, floating-point operations (FLOPs), and TOP-1 classification accuracy. Compared with ResNet18, the ECA-ResNet18 backbone used in the proposed method significantly improves the classification accuracy with an increase of only 0.4 M parameters and 0.2 G FLOPs. Compared with the larger-scale ResNet50, the performance of the proposed method is less than one-half of that of ResNet50 in terms of number of parameters and computation, but its classification accuracy on the CAR dataset differs from ResNet50 by only 0.6%. Compared with the larger ViT-Base and Swin-Transformer-B, the proposed method is about one-eighth of both in terms of number of parameters and computation, and its classification accuracies on the CAR and AIR datasets are 3.7% and 5.3% lower than the optimal Swin-Transformer-B. These results demonstrate that the classification accuracy of the proposed method is significantly improved with only a small increase in model complexity.
Conclusion
2
The proposed self-knowledge distillation fine-grained image classification method achieves good performance results with 11.9 M parameters and 2.0 G FLOPs, and its lightweight network model is suitable for edge computing applications for embedded devices.
细粒度图像分类通道注意力知识蒸馏(KD)自知识蒸馏(SKD)特征融合卷积神经网络(CNN)轻量级模型
fine-grained image classificationchannel attentionknowledge distillation(KD)self-knowledge distillation(SKD)feature fusionconvolutional neural network(CNN)lightweight model
Chen P G, Liu S, Zhao H S and Jia J Y. 2021. Distilling knowledge via knowledge review//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 5008-5017 [DOI: 10.1109/CVPR46437.2021.00497http://dx.doi.org/10.1109/CVPR46437.2021.00497]
Cho Y, Ham G, Lee J H and Kim D. 2023. Ambiguity-aware robust teacher (ART): enhanced self-knowledge distillation framework with pruned teacher network. Pattern Recognition, 140: #109541 [DOI: 10.1016/j.patcog.2023.109541http://dx.doi.org/10.1016/j.patcog.2023.109541]
Chollet F. 2017. Xception: deep learning with depthwise separable convolutions//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1800-1807 [DOI: 10.1109/CVPR.2017.195http://dx.doi.org/10.1109/CVPR.2017.195]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2021. An image is worth16 × 16 words: Transformers for image recognition at scale [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/2010.11929.pdfhttps://arxiv.org/pdf/2010.11929.pdf
Gou J P, Yu B S, Maybank S J and Tao D C. 2021. Knowledge distillation: a survey. International Journal of Computer Vision, 129(6): 1789-1819 [DOI: 10.1007/s11263-021-01453-zhttp://dx.doi.org/10.1007/s11263-021-01453-z]
Guo Z Y, Yan H N, Li H and Lin X D. 2023. Class attention transfer based knowledge distillation//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 11868-11877 [DOI: 10.1109/CVPR52729.2023.01142http://dx.doi.org/10.1109/CVPR52729.2023.01142]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hinton G, Vinyals O and Dean J. 2015. Distilling the knowledge in a neural network [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/1503.02531.pdfhttps://arxiv.org/pdf/1503.02531.pdf
Hou Y N, Ma Z, Liu C X and Loy C C. 2019. Learning lightweight lane detection CNNs by self attention distillation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 1013-1021 [DOI: 10.1109/ICCV.2019.00110http://dx.doi.org/10.1109/ICCV.2019.00110]
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141 [DOI: 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745]
Huang Z H, Yang S Z, Lin W, Ni J, Sun S L, Chen Y W and Tang Y. 2022. Knowledge distillation: a survey. Chinese Journal of Computers, 45(3): 624-653
黄震华, 杨顺志, 林威, 倪娟, 孙圣力, 陈运文, 汤庸. 2022. 知识蒸馏研究综述. 计算机学报, 45(3): 624-653 [DOI: 10.11897/SP.J.1016.2022.00624http://dx.doi.org/10.11897/SP.J.1016.2022.00624]
Ji M, Shin S, Hwang S, Park G and Moon I C. 2021. Refine myself by teaching myself: feature refinement via self-knowledge distillation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 10664-10673 [DOI: 10.1109/CVPR46437.2021.01052http://dx.doi.org/10.1109/CVPR46437.2021.01052]
Jiang L Y, Zheng Y F, Chen C, Li G H and Zhang W J. 2023. Review of optimization methods for supervised deep learning. Journal of Image and Graphics, 28(4): 963-983
江铃燚, 郑艺峰, 陈澈, 李国和, 张文杰. 2023. 有监督深度学习的优化方法研究综述. 中国图象图形学报, 28(4): 963-983 [DOI: 10.11834/jig.211139http://dx.doi.org/10.11834/jig.211139]
Krause J, Stark M, Deng J and Li F F. 2013. 3D object representations for fine-grained categorization//Proceedings of 2013 IEEE International Conference on Computer Vision Workshops. Sydney, Australia: IEEE: 554-561 [DOI: 10.1109/ICCVW.2013.77http://dx.doi.org/10.1109/ICCVW.2013.77]
Krizhevsky A, Sutskever I and Hinton G. 2012. Imagenet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc.: 1097-1105
Lee H, Hwang S J and Shin J. 2020. Self-supervised label augmentation via input transformations//Proceedings of the 37th International Conference on Machine Learning. Virtual Event, JMLR.org: 5714-5724
Liu Z, Lin Y T, Cao Y, Hu H, Wei Y X, Zhang Z, Lin S and Guo B N. 2021. Swin Transformer: hierarchical vision Transformer using shifted windows//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 10012-10022 [DOI: 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986]
Maji S, Rahtu E, Kannala J, Blaschko M and Vedaldi A. 2013. Fine-grained visual classification of aircraft [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/1306.5151.pdfhttps://arxiv.org/pdf/1306.5151.pdf
Mirzadeh S I, Farajtabar M, Li A, Levine N, Matsukawa A and Ghasemzadeh H. 2020. Improved knowledge distillation via teacher assistant//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI Press: 5191-5198 [DOI: 10.1609/aaai.v34i04.5963http://dx.doi.org/10.1609/aaai.v34i04.5963]
Romero A, Ballas N, Kahou S E, Chassang a, Gatta C and Bengio Y. 2015. Fitnets: hints for thin deep nets [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/1412.6550.pdfhttps://arxiv.org/pdf/1412.6550.pdf
Si Z F and Qi H G. 2023. Survey on knowledge distillation and its application. Journal of Image and Graphics, 28(9): 2817-2832
司兆峰, 齐洪钢. 2023. 知识蒸馏方法研究与应用综述. 中国图象图形学报, 28(9): 2817-2832 [DOI: 10.11834/jig.220273http://dx.doi.org/10.11834/jig.220273]
Sun D W, Yao A B, Zhou A J and Zhao H. 2019. Deeply-supervised knowledge synergy//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 6997-7006 [DOI: 10.1109/CVPR.2019.00716http://dx.doi.org/10.1109/CVPR.2019.00716]
Tan M X, Pang R M and Le Q V. 2020. EfficientDet: scalable and efficient object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 10781-10790 [DOI: 10.1109/CVPR42600.2020.01079http://dx.doi.org/10.1109/CVPR42600.2020.01079]
Tian Y L, Krishnan D and Isola P. 2020. Contrastive representation distillation [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/1910.10699.pdfhttps://arxiv.org/pdf/1910.10699.pdf
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, N. Gomez A, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 5998-6008
Wah C, Branson S, Welinder P, Perona P and Belongie S. 2011. The Caltech-UCSD Birds-200-2011 Dataset. Pasadena: California Institute of Technology
Wang Q L, Wu B G, Zhu P F, Li P H, Zuo W M and Hu Q H. 2020. ECA-Net: efficient channel attention for deep convolutional neural networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11534-11542 [DOI: 10.1109/CVPR42600.2020.01155http://dx.doi.org/10.1109/CVPR42600.2020.01155]
Wang L and Yoon K J. 2022. Knowledge distillation and student-teacher learning for visual intelligence: a review and new outlooks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6): 3048-3068 [DOI: 10.1109/TPAMI.2021.3055564http://dx.doi.org/10.1109/TPAMI.2021.3055564]
Wang X C, Han P C and Guo L. 2023. Lightweight self-knowledge distillation with multi-source information fusion [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/2305.09183.pdfhttps://arxiv.org/pdf/2305.09183.pdf
Wei X S, Song Y Z, Aodha O M, Wu J X, Peng Y X, Tang J H, Yang J and Belongie S. 2022. Fine-grained image analysis with deep learning: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12): 8927-8948 [DOI: 10.1109/TPAMI.2021.3126648http://dx.doi.org/10.1109/TPAMI.2021.3126648]
Wei X S, Wu J and Cui Q. 2019. Deep learning for fine-grained image analysis: a survey [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/1907.03069.pdfhttps://arxiv.org/pdf/1907.03069.pdf
Wei X S, Xu Y Y and Yang J. 2022. Review of webly-supervised fine-grained image recognition. Journal of Image and Graphics, 27(7): 2057-2077
魏秀参, 许玉燕, 杨健. 2022. 网络监督数据下的细粒度图像识别综述. 中国图象图形学报, 27(7): 2057-2077 [DOI: 10.11834/jig.210188http://dx.doi.org/10.11834/jig.210188]
Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 3-19 [DOI: 10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1]
Xu T B and Liu C L. 2019. Data-distortion guided self-distillation for deep neural networks//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI Press: 5565-5572 [DOI: 10.1609/aaai.v33i01.33015565http://dx.doi.org/10.1609/aaai.v33i01.33015565]
Yang C G, An Z L, Zhou H L, Cai L H, Zhi X, Wu J W, Xu Y and Zhang Q. 2022. MixSKD: self-knowledge distillation from mixup for image recognition//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer: 534-551 [DOI: 10.1007/978-3-031-20053-3_31http://dx.doi.org/10.1007/978-3-031-20053-3_31]
Yuan L, Tay F E, Li G L, Wang T and Feng J S. 2020. Revisiting knowledge distillation via label smoothing regularization//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 3903-3911 [DOI: 10.1109/CVPR42600.2020.00396http://dx.doi.org/10.1109/CVPR42600.2020.00396]
Yun S, Park J, Lee K and Shin J. 2020. Regularizing class-wise predictions via self-knowledge distillation//Proceedings of 2010 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 13876-13885 [DOI: 10.1109/CVPR42600.2020.01389http://dx.doi.org/10.1109/CVPR42600.2020.01389]
Zagoruyko S and Komodakis N. 2017. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/1612.03928.pdfhttps://arxiv.org/pdf/1612.03928.pdf
Zhang H Y, Cisse M, Dauphin Y N and Lopez P D. 2018. Mixup: beyond empirical risk minimization [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/1710.09412.pdfhttps://arxiv.org/pdf/1710.09412.pdf
Zhang L F, Bao C L and Ma K S. 2021. Self-distillation: towards efficient and compact neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8): 4388-4403 [DOI: 10.1109/TPAMI.2021.3067100http://dx.doi.org/10.1109/TPAMI.2021.3067100]
Zhang L F, Song J B, Gao A N, Chen J W, Bao C L and Ma K S. 2019. Be your own teacher: improve the performance of convolutional neural networks via self-distillation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 3713-3722 [DOI: 10.1109/ICCV.2019.00381http://dx.doi.org/10.1109/ICCV.2019.00381]
Zhao B R, Cui Q, Song R J, Qiu Y Y and Liang J J. 2022. Decoupled knowledge distillation//Proceedings of 2022 IEEE/CVF International Conference on Computer Vision. New Orleans, USA: IEEE: 11953-11962 [DOI: 10.1109/CVPR52688.2022.01165http://dx.doi.org/10.1109/CVPR52688.2022.01165]
Zhou B L, Khosla A, Lapedriza A, Oliva A and Torralba A. 2016. Learning deep features for discriminative localization//Proceedings of 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2921-2929 [DOI: 10.1109/CVPR.2016.319http://dx.doi.org/10.1109/CVPR.2016.319]
相关作者
相关机构