知识增强的特征编辑重建蒸馏
Knowledge-enhanced feature editing reconstruction distillation
- 2025年30卷第1期 页码:161-172
纸质出版日期: 2025-01-16
DOI: 10.11834/jig.230890
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2025-01-16 ,
移动端阅览
宋涛, 张景涛, 李沩沩, 赵明富, 冉璐, 叶定兴, 杨贻晨, 岳岱衡. 知识增强的特征编辑重建蒸馏[J]. 中国图象图形学报, 2025,30(1):161-172.
SONG TAO, ZHANG JINGTAO, LI WEIWEI, ZHAO MINGFU, RAN LU, YE DINGXING, YANG YICHEN, YUE DAIHENG. Knowledge-enhanced feature editing reconstruction distillation. [J]. Journal of image and graphics, 2025, 30(1): 161-172.
目的
2
知识蒸馏作为一种有效可行的模型压缩方法现已在图像分类领域得到了广泛的研究。由于目标检测网络的复杂性加大了知识蒸馏的难度,现有应用于目标检测的知识蒸馏方法效果并不理想。为此,提出一种适用于目标检测任务的知识蒸馏方法,即基于知识增强的特征编辑重建蒸馏,实现对目标检测模型的有效压缩。
方法
2
针对现有方法只在教师模型和学生模型的对应特征层之间进行蒸馏,无法充分利用教师模型中隐藏的“暗知识”的问题,通过空间注意力和通道注意力分别对教师特征进行自下而上和自上而下的多尺度特征融合以进行知识增强;针对当教师模型和学生模型能力差距过大时,学生模型无法理解教师模型的知识导致蒸馏性能受限的问题,通过将教师模型的部分特征作为先验知识融合到学生模型的特征中,以缩小教师模型和学生模型之间的表征能力差距,并删除学生模型特征的边缘、轮廓等细节信息,实现特征编辑,然后迫使学生模型利用剩余特征结合先验知识通过一个简单的卷积块来恢复删除的细节信息,进行特征重建,使学生模型在此过程中得到正向的反馈,从而学习到更好的特征。
结果
2
实验在两个数据集上采用基于ResNet50(residual network)的RetinaNet(retina network)、Faster R-CNN(faster region-based convolutional neural network)、FCOS(fully convolutional one-stage object detection)3种不同类型的检测器与最新的4种方法进行了比较,VOC2007(visual object classes 2007)测试集的平均精度均值(mean average precision,mAP)对比基线分别提高了2.1%、2.7%和3.8%;NEU-DET(Northeastern University surface defect database)测试集的mAP对比基线分别提高了2.7%、2.6%和2.1%,均高于当前性能最优的算法。
结论
2
本文所提出的方法能充分挖掘出教师模型的能力,有效提升学生模型的性能,同时适用于多种类型的目标检测器。
Objective
2
In recent years, convolutional neural networks have had great potential in various fields of computer vision due to their excellent feature extraction ability. As the performance of the model increases, the size of the model becomes increasingly bloated, and the large number of parameters makes its inference speed slow. Even with the help of GPU acceleration, the real-time demand of many application scenarios cannot be met. In addition, the occupied memory and storage space increase the cost of use. As a result, these large models are difficult to deploy and run on mobile devices or embedded platforms with limited arithmetic power and storage space, restricting their promotion. Therefore, how to compress large deep neural network models is a key issue. Knowledge distillation is a simple and effective model compression method. Unlike the idea of model pruning or parameter quantization, knowledge distillation is essentially a special model training method. It does not directly make changes to the model structure or parameters; rather, it compresses the volume without changing the model structure and parameters. It is also used in model training to learn the hard labels in the training set aside from the hard labels in the training set and guide the teacher’s model by using the teacher’s model classification. In addition to learning the hard labels in the training set, we use the classification output of the teacher’s model as soft labels to guide the learning of the student model so that the hidden “dark knowledge” in the teacher’s model, which is powerful but has a bloated network structure, is transferred to the student’s model. The student’s model has a relatively simple network structure and a smaller parameter volume. This knowledge transfer enables the student’s model, which has fewer model parameters and a faster reasoning speed, to achieve comparable accuracy with the teacher’s model, thus achieving model compression. However, the target detection task requires classifying the target and outputting the specific position of the target in the picture. This step is not possible just by learning the labels output from the teacher’s model. Therefore, the traditional knowledge distillation method used for classification does not work well on the target detection task. Moreover, the network structure of the detector is more complex, so existing knowledge distillation methods based on the target detection task usually let the features of the student model directly learn the features of the teacher model for distillation instead of learning the labels of the teacher. Existing methods also have multiple limitations. Therefore, a new knowledge distillation method applicable to the target detection task is proposed, i.e., feature editing reconstruction distillation based on knowledge enhancement, to achieve effective compression of the target detection model.
Method
2
Two modules are constructed to address two common problems of the current knowledge distillation methods for the target detection task: 1) knowledge enhancement module and 2) feature editing and reconstruction module. To address the problem in which the existing methods distill only between the corresponding feature layers of the teacher’s model and the student’s model, which cannot fully utilize the hidden “dark knowledge” in the teacher’s model, the knowledge enhancement module is introduced to enhance the knowledge of the teacher’s model through spatial attention and channel attention, respectively. Therefore, the knowledge enhancement module is introduced to perform bottom-up and top-down multi-scale feature fusion of teacher features through spatial attention and channel attention, respectively, for knowledge enhancement. As the performance of the teacher model continues to improve, the ability gap between students and teachers becomes increasingly larger, and the trend of performance enhancement of the student model gradually reaches saturation or even decreases. The feature edit reconstruction module is used to construct a new distillation paradigm to fuse some features of the teacher’s model into the features of the student’s model as a priori knowledge to narrow the representation capability gap between the teacher’s model and the student’s model. This module then randomly deletes detailed information such as edges and contours of the student model’s features through a pixel-level mask to realize feature editing, and then forces the student model to use the remaining features in combination with the a priori knowledge through a simple convolutional block to recover the deleted detail information for feature reconstruction, and in the process of model learning to optimize the quality of the reconstructed feature maps, the student’s original feature maps are back-propagated through feature reconstruction, thus learning features with stronger representational capabilities.
Result
2
Experiments were conducted based on three different types of detectors:retina network(RetinaNet), faster region-based convolutional neural network(Faster R-CNN), and fully convolutional one-stage object detection(FCOS),on the generalized target detection dataset visual object classes 2007(VOC2007) and the steel surface defects dataset Northeastern University surface defect database(NEU-DET) by using the teacher’s model, student’s model of ResNet101-ResNet50. First, feature map visualization found that the feature map of the distilled detector responds significantly less to the background noise information and pays more attention to the foreground critical region. A comparison of the visualization of the detection results show the distilled detector significantly improves the situation of misdetection and omission. An evaluation of the detection results using the mAP metrics shows that the mAP comparison baselines for the VOC2007 test set improved by 2.1%, 2.7%, and 3.8%, respectively, and the mAP comparison baselines for the NEU-DET test set improved by 2.7%, 2.6%, and 2.1%, respectively.
Conclusion
2
In this study, a new knowledge distillation method for target detection task is proposed, and experimental results show that this method can significantly improve the focus of the detector feature map on the key target region and reduce the interference of noise. Thus, the false detection and leakage rate is reduced, and the accuracy improvement is better than that of several state-of-the-art algorithms. The proposed method is suitable for general target detection datasets and specialized defect detection datasets, with good generalization performance, and can be applied to many types of detectors at the same time.
模型压缩知识蒸馏知识增强特征重建目标检测
model compressionknowledge distillationknowledge enhancementfeature reconstructiontarget detection
Chen G B, Choi W, Yu X, Han T and Chandraker M. 2017. Learning efficient object detection models with knowledge distillation//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 742-751
Cho J H and Hariharan B. 2019. On the efficacy of knowledge distillation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE: 4793-4801 [DOI: 10.1109/ICCV.2019.00489http://dx.doi.org/10.1109/ICCV.2019.00489]
Dai X, Jiang Z R, Wu Z, Bao Y P, Wang Z C, Liu S and Zhou E J. 2021. General instance distillation for object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 7838-7847 [DOI: 10.1109/CVPR46437.2021.00775http://dx.doi.org/10.1109/CVPR46437.2021.00775]
Ding W L, Zhu F L, Zheng K and Jia X P. 2024. Classification of breast pathological images based on multiscale information interaction and fusion. Journal of Image and Graphics, 29(4): 1085-1099
丁维龙, 朱峰龙, 郑魁, 贾秀鹏. 2024. 多尺度信息交互与融合的乳腺病理图像分类. 中国图象图形学报, 29(4): 1085-1099 [DOI: 10.11834/jig.221178http://dx.doi.org/10.11834/jig.221178]
Du Y L, Wu T Y, Chen K, Chen G and Song W. 2023. Small object detection for ocean eddies using contextual information and attention mechanism. Journal of Image and Graphics, 28(11): 3509-3519
杜艳玲, 吴天宇, 陈括, 陈刚, 宋巍. 2023. 融合上下文和注意力的海洋涡旋小目标检测. 中国图象图形学报, 28(11): 3509-3519 [DOI: 10.11834/jig.220944http://dx.doi.org/10.11834/jig.220944]
Feng Y, Zhang S F and Wu X F. 2020. A light-weighted SSD network design for object detection. Journal of Signal Processing, 36(5): 756-762
冯烨, 张索非, 吴晓富. 2020. 面向目标检测的SSD网络轻量化设计研究. 信号处理, 36(5): 756-762 [DOI: 10.16798/j.issn.1003-0530.2020.05.015http://dx.doi.org/10.16798/j.issn.1003-0530.2020.05.015]
Fu J, Liu J, Tian H J, Li Y, Bao Y J, Fang Z W and Lu H Q. 2019. Dual attention network for scene segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 3141-3149 [DOI: 10.1109/CVPR.2019.00326http://dx.doi.org/10.1109/CVPR.2019.00326]
Gao H, Tian Y L, Xu F Y and Zhong S. 2021. Survey of deep learning model compression and acceleration. Journal of Software, 32(1): 68-92
高晗, 田育龙, 许封元, 仲盛. 2021. 深度学习模型压缩与加速综述. 软件学报, 32(1): 68-92 [DOI: 10.13328/j.cnki.jos.006096http://dx.doi.org/10.13328/j.cnki.jos.006096]
Ge Y X, Chen D P and Li H S. 2020. Mutual mean-teaching: pseudo label refinery for unsupervised domain adaptation on person re-identification [EB/OL]. [2023-12-20]. https://arxiv.org/pdf/2001.01526.pdfhttps://arxiv.org/pdf/2001.01526.pdf
Guo J Y, Han K, Wang Y H, Wu H, Chen X H, Xu C J and Xu C. 2021. Distilling object detectors via decoupled features//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 2154-2164 [DOI: 10.1109/CVPR46437.2021.00219http://dx.doi.org/10.1109/CVPR46437.2021.00219]
He K M, Chen X L, Xie S N, Li Y H, Dollr P and Girshick R. 2022. Masked autoencoders are scalable vision learners//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE: 15979-15988 [DOI: 10.1109/CVPR52688.2022.01553http://dx.doi.org/10.1109/CVPR52688.2022.01553]
Hinton G, Vinyals O and Dean J. 2015. Distilling the knowledge in a neural network [EB/OL]. [2023-12-20]. https://arxiv.org/pdf/1503.02531.pdfhttps://arxiv.org/pdf/1503.02531.pdf
Li F Y, Ye B and Qin C. 2023. Mutual attention mechanism-driven lightweight semantic segmentation network. Journal of Image and Graphics, 28(7): 2068-2080
栗风永, 叶彬, 秦川. 2023. 互注意力机制驱动的轻量级图像语义分割网络. 中国图象图形学报, 28(7): 2068-2080 [DOI: 10.11834/jig.211127http://dx.doi.org/10.11834/jig.211127]
Lin T Y, Goyal P, Girshick R, He K M and Dollr P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2999-3007 [DOI: 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324]
Pham H, Dai Z H, Xie Q Z and Le Q V. 2021. Meta pseudo labels//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 11552-11563 [DOI: 10.1109/CVPR46437.2021.01139http://dx.doi.org/10.1109/CVPR46437.2021.01139]
Ren S Q, He K M, Girshick R and Sun J. 2015. Faster R-CNN: towards real-time object detection with region proposal networks//Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS). Montreal, Canada: MIT Press: 91-99
Shu C Y, Liu Y F, Gao J F, Yan Z and Shen C H. 2021. Channel-wise knowledge distillation for dense prediction//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE: 5291-5300 [DOI: 10.1109/ICCV48922.2021.00526http://dx.doi.org/10.1109/ICCV48922.2021.00526]
Sun R Y, Tang F H, Zhang X P, Xiong H K and Tian Q. 2020. Distilling object detectors with task adaptive regularization [EB/OL]. [2023-12-20]. https://arxiv.org/pdf/2006.13108.pdfhttps://arxiv.org/pdf/2006.13108.pdf
Tian Z, Shen C H, Chen H and He T. 2019. FCOS: fully convolutional one-stage object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE: 9626-9635 [DOI: 10.1109/ICCV.2019.00972http://dx.doi.org/10.1109/ICCV.2019.00972]
Wang T, Yuan L, Zhang X P and Feng J S. 2019. Distilling object detectors with fine-grained feature imitation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 4928-4937 [DOI: 10.1109/CVPR.2019.00507http://dx.doi.org/10.1109/CVPR.2019.00507]
Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 3-19 [DOI: 10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1]
Xie Q Z, Luong M T, Hovy E and Le Q V. 2020. Self-training with noisy student improves ImageNet classification//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 10684-10695 [DOI: 10.1109/CVPR42600.2020.01070http://dx.doi.org/10.1109/CVPR42600.2020.01070]
Yang Z D, Li Z, Jiang X H, Gong Y, Yuan Z H, Zhao D P and Yuan C. 2022. Focal and global knowledge distillation for detectors//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE: 4633-4642 [DOI: 10.1109/CVPR52688.2022.00460http://dx.doi.org/10.1109/CVPR52688.2022.00460]
Zheng X P and Liang X. 2022. Deep capsule network based on pruning optimization. Chinese Journal of Computers, 45(7): 1557-1570
郑香平, 梁循. 2022. 基于剪枝优化的深层胶囊网络. 计算机学报, 45(7): 1557-1570 [DOI: 10.11897/SP.J.1016.2022.01557http://dx.doi.org/10.11897/SP.J.1016.2022.01557]
相关作者
相关机构