面向高光谱场景分类的空—谱模型蒸馏网络
Spatial-spectral model distillation network for hyperspectral scene classification
- 2024年29卷第8期 页码:2205-2219
纸质出版日期: 2024-08-16
DOI: 10.11834/jig.230699
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-08-16 ,
移动端阅览
薛洁, 黄鸿, 蒲春宇, 杨鄞铭, 李远, 刘英旭. 2024. 面向高光谱场景分类的空—谱模型蒸馏网络. 中国图象图形学报, 29(08):2205-2219
Xue Jie, Huang Hong, Pu Chunyu, Yang Yinming, Li Yuan, Liu Yingxu. 2024. Spatial-spectral model distillation network for hyperspectral scene classification. Journal of Image and Graphics, 29(08):2205-2219
目的
2
现有场景分类方法主要面向高空间分辨率图像,但这些图像包含极为有限的光谱信息,且现有基于卷积神经网络(convolutional neural network,CNN)的方法由于卷积操作的局部性忽略了远程上下文信息的捕获。针对上述问题,提出了一种面向高光谱场景分类的空—谱模型蒸馏网络(spatial-spectral model distillation network for hyperspectral scene classification,SSMD)。
方法
2
选择基于空—谱注意力的ViT方法(spatial-spectral vision Transformer,SSViT)探测不同类别的光谱信息,通过寻找光谱信息之间的差异性对地物进行精细分类。利用知识蒸馏将教师模型SSViT捕获的长距离依赖信息传递给学生模型VGG16(Visual Geometry Group 16)进行学习,二者协同合作,教师模型提取的光谱信息和全局信息与学生模型提取的局部信息融合,进一步提升学生分类性能并保持较低的时间代价。
结果
2
实验在3个数据集上与10种分类方法(5种传统CNN分类方法和5种较新场景分类方法)进行了比较。综合考虑时间成本和分类精度,本文方法在不同数据集上取得了不同程度的领先。在OHID-SC(Orbita hyperspectral image scene classification dataset)、OHS-SC(Orbita hyperspectral scene classification dataset)和HSRS-SC(hyperspectral remote sensing dataset for scene classification)数据集上的精度,相比于性能第2的模型,分类精度分别提高了13.1%、2.9%和0.74%。同时在OHID-SC数据集中进行的对比实验表明提出的算法有效提高了高光谱场景分类精度。
结论
2
提出的SSMD网络不仅有效利用高光谱数据目标光谱信息,并探索全局与局部间的特征关系,综合了传统模型和深度学习模型的优点,使分类结果更加准确。
Objective
2
In recent years, the development of remote sensing technology has enabled the acquisition of abundant remote sensing images and large datasets. Scene classification tasks, as key areas in remote sensing research, aim to distinguish and classify images with similar scene features by assigning fixed semantic labels to each scene image. Various scene classification methods have been proposed, including handcrafted feature- and deep learning-based methods. However, handcrafted feature-based methods have limitations in describing scene semantic information due to high requirements for feature descriptors. Meanwhile, deep learning-based methods for scene classification of remote sensing images have shown powerful feature extraction capabilities and have been widely applied in scene classification. However, current scene classification methods mainly focus on remote sensing images with high spatial resolution, which are mostly three-channel images with limited spectral information. This limitation often leads to confusion and misclassification in visually similar categories such as geometric structures, textures, and colors. Therefore, integrating spectral information to improve the accuracy of scene classification has become an important research direction. However, existing methods have some shortcomings. For example, convolutional operations have translation invariance and are sensitive to local information, which causes difficulty in capturing remote contextual information. Meanwhile, although Transformer methods can extract long-range dependency information, they have limited capability in learning local information. Moreover, combining convolutional neural networks (CNNs) and Transformer methods incurs high computational complexity, which hinders the balance between inference efficiency and classification accuracy. This study proposes a high spectral scene classification method called the spatial-spectral model distillation (SSMD) network to address the aforementioned issues.
Method
2
In this study, we utilize spectral information to improve the accuracy of scene classification and overcome the limitations of existing methods. First, we propose a spatial-spectral joint self-attention mechanism based on ViT (SSViT) to fully exploit the spectral information of hyperspectral images. SSViT integrates spectral information into the Transformer architecture. By exploring the intrinsic relationships between pixels and between spectra, SSViT extracts richer features. In the spatial-spectral joint mechanism, SSViT leverages the spectral information of different categories to identify the differences between them, which enables fine-grained classification of land cover and improves the accuracy of scene classification. Second, we introduce the concept of knowledge distillation to further enhance the classification performance. In the framework of teacher-student models, SSViT is used as the teacher model, and a pretrained model, that is, Visual Geometry Group 16(VGG16), is used as the student model to capture contextual information of complex scenes. The teacher model extracts spectral information and global features among samples, while the student model focuses on capturing local features. The student model can learn and mimic the prior knowledge of the teacher model, which improves the discriminative ability of the student model. The joint training of the teacher-student models enables comprehensive extraction of land cover features, which improves the accuracy of scene classification. Specifically, the image is divided into 64 image patches in the spatial dimension, and 32 spectral bands in the spectral dimension. Each patch and band can be regarded as a token. Each patch and band are flattened into row vectors and mapped to a specific dimension through a linear layer. The learned vectors are concatenated with the embedded samples for the final prediction of image classification of the teacher model. A position vector is generated and directly concatenated with the token mentioned above as the input to the Transformer. The multi-head attention mechanism outputs encoded representations containing information from different subspaces to model global contextual information, which improves the representation capacity and learning effectiveness of the model. Finally, feature integration is performed through a multi-layer perceptron and a classification layer to achieve classification. The process of knowledge distillation consists of two stages. The first stage optimizes the teacher and student models by minimizing the loss function with distillation coefficients. In the second stage, the student model is further adjusted using the loss function, which leverages the supervision from the performance-excellent complex model to train the simple model. This adjustment aims for higher accuracy and better classification performance. The complex model is referred to as the teacher model, while the simpler model is referred to as the student model. The training mode of knowledge distillation provides the student model with more informative content, which allows it to directly learn the generalization ability of the teacher model.
Result
2
We compare our model with 10 models, including 5 traditional CNN classification methods and 5 latest scene classification methods on 3 public datasets, namely, OHID-SC (Orbita hyperspectral image scene classification dataset), OHS-SC (another Orbita hyperspectral scene classification dataset), and HSRS-SC (Hyperspectral remote sensing dataset for scene classification). The quantitative evaluation metrics include overall accuracy, standard deviation, and confusion matrix, and the confusion matrix on the three datasets is provided to clearly display the classification results of the proposed algorithm. Experimental results show that our model outperforms all other methods on OHID-SC, OHS-SC, and HSRS-SC datasets, and the classification accuracies on OHID-SC, OHS-SC, and HSRS-SC datasets are improved by 13.1%,2.9%, and 0.74%, respectively, compared with the second-best model. Meanwhile, comparative experiments on OHID-SC dataset show that the proposed algorithm can effectively improve the classification accuracy of hyperspectral scenes.
Conclusion
2
In this study, the proposed SSMD network not only effectively utilizes the target spectral information of hyperspectral data but also explores the feature relationship between global and local levels, synthesizes the advantages of traditional and deep learning models, and produces more accurate classification results.
高光谱场景分类卷积神经网络(CNN)Transformer空—谱联合自注意力机制知识蒸馏(KD)
hyperspectral scene classificationconvolutional neural network (CNN)Transformerspatial-spectral joint self-attention mechanismknowledge distillation (KD)
Bai K, Mu X D, Chen X B, Zhu Y Q and You X A. 2022. Unsupervised remote sensing image scene classification based on semi-supervised learning. Acta Geodaetica et Cartographica Sinica, 51(5) : 691-702
白坤, 慕晓冬, 陈雪冰, 朱永清, 尤轩昂. 2022. 融合半监督学习的无监督遥感影像场景分类. 测绘学报, 51(5): 691-702 [DOI: 10.11947/j.AGCS.20210270http://dx.doi.org/10.11947/j.AGCS.20210270]
Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE: 886-893 [DOI: 10.1109/CVPR.2005.177http://dx.doi.org/10.1109/CVPR.2005.177]
Deng P F, Xu K J and Huang H. 2021. CNN-GCN-based dual-stream network for scene classification of remote sensing images. National Remote Sensing Bulletin, 25(11): 2270-2282
邓培芳, 徐科杰, 黄鸿. 2021. 基于CNN-GCN双流网络的高分辨率遥感影像场景分类. 遥感学报, 25(11): 2270-2282 [DOI: 10.11834/jrs.20210587http://dx.doi.org/10.11834/jrs.20210587]
Hu Y, Huang B C and Li F. 2023. Image classification based on Transformer adaptive feature vector fusion. Journal of Optoelectronics·Laser, 34(6): 602-609
胡义, 黄勃淳, 李凡. 2023. 基于Transformer自适应特征向量融合的图像分类. 光电子·激光, 34(6): 602-609 [DOI: 10.16136/j.joel.2023.06.0303http://dx.doi.org/10.16136/j.joel.2023.06.0303]
Jégou H, Perronnin F, Douze M, Snchez J, Pérez P and Schmid C. 2012. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9): 1704-1716 [DOI: 10.1109/TPAMI.2011.235http://dx.doi.org/10.1109/TPAMI.2011.235]
Lazebnik S, Schmid C and Ponce J. 2006. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 2169-2178 [DOI: 10.1109/CVPR.2006.68http://dx.doi.org/10.1109/CVPR.2006.68]
Li L J, Han J W, Yao X W, Cheng G and Guo L. 2021. DLA-MatchNet for few-shot remote sensing image scene classification. IEEE Transactions on Geoscience and Remote Sensing, 59(9): 7844-7853 [DOI: 10.1109/TGRS.2020.3033336http://dx.doi.org/10.1109/TGRS.2020.3033336]
Li R, Zheng S Y, Duan C X, Yang Y and Wang X Q. 2020. Classification of hyperspectral image based on double-branch dual-attention mechanism network. Remote Sensing, 12(3): #582 [DOI: 10.3390/rs12030582http://dx.doi.org/10.3390/rs12030582]
Liang Y L, Monteiro S T and Saber E S. 2016. Transfer learning for high resolution aerial image classification//Proceedings of 2016 IEEE Applied Imagery Pattern Recognition Workshop (AIPR). Washington, USA: IEEE: 1-8 [DOI: 10.1109/AIPR.2016.8010600http://dx.doi.org/10.1109/AIPR.2016.8010600]
Lim C H, Risnumawan A and Chan C S. 2014. A scene image is nonmutually exclusive—a fuzzy qualitative scene understanding. IEEE Transactions on Fuzzy Systems, 22(6): 1541-1556 [DOI: 10.1109/TFUZZ.2014.2298233http://dx.doi.org/10.1109/TFUZZ.2014.2298233]
Liu R M, Ning X, Cai W W and Li G J. 2021. Multiscale dense cross-attention mechanism with covariance pooling for hyperspectral image scene classification. Mobile Information Systems, 2021: #9962057 [DOI: 10.1155/2021/9962057http://dx.doi.org/10.1155/2021/9962057]
Liu Y X, Pu C Y, Xu D K, Yang Y C and Huang H. 2023. Lightweight deep global-local knowledge distillation network for hyperspectral image scene classification. Optics and Precision Engineering, 31(17): 2598-2610
刘英旭, 蒲春宇, 许典坤, 杨沂川, 黄鸿. 2023. 面向高光谱影像场景分类的轻量化深度全局-局部知识蒸馏网络. 光学精密工程, 31(17): 2598-2610 [DOI: 10.37188/OPE.20233117.2598http://dx.doi.org/10.37188/OPE.20233117.2598]
Lu X Q, Sun H and Zheng X T. 2019. A feature aggregation convolutional neural network for remote sensing scene classification. IEEE Transactions on Geoscience and Remote Sensing, 57(10): 7894-7906 [DOI: 10.1109/TGRS.2019.2917161http://dx.doi.org/10.1109/TGRS.2019.2917161]
Ma D A, Tang P and Zhao L J. 2019. SiftingGAN: generating and sifting labeled samples to improve the remote sensing image scene classification baseline in vitro. IEEE Geoscience and Remote Sensing Letters, 16(7): 1046-1050 [DOI: 10.1109/LGRS.2018.2890413http://dx.doi.org/10.1109/LGRS.2018.2890413]
Ni K, Zhao Y Q and Chen Z. 2022. Multi-scale convolutional neural network driven by sparse second-order attention mechanism for remote sensing scene classification. Acta Photonica Sinica, 51(6): #0610004
倪康, 赵雨晴, 陈志. 2022. 稀疏二阶注意力机制驱动的多尺度卷积遥感图像场景分类网络. 光子学报, 51(6): #0610004 [DOI: 10.3788/gzxb20225106.0610004http://dx.doi.org/10.3788/gzxb20225106.0610004]
Ojala T, Pietikainen M and Maenpaa T. 2002. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7): 971-987 [DOI: 10.1109/TPAMI.2002.1017623http://dx.doi.org/10.1109/TPAMI.2002.1017623]
Pan E T, Ma Y, Huang J, Fan F, Li H and Ma J Y. 2021. Hyperspectral image classification evaluated across different datasets. Journal of Image and Graphics, 26(8): 1969-1977
潘尔婷, 马泳, 黄珺, 樊凡, 李皞, 马佳义. 2021. 跨数据集评估的高光谱图像分类. 中国图象图形学报, 26(8): 1969-1977 [DOI: 10.11834/jig.210123http://dx.doi.org/10.11834/jig.210123]
Wang Y Z, Xiao R, Qi J and Tao C. 2022. Cross-sensor remote-sensing images scene understanding based on transfer learning between heterogeneous networks. IEEE Geoscience and Remote Sensing Letters, 19: #8021705 [DOI: 10.1109/LGRS.2021.3116601http://dx.doi.org/10.1109/LGRS.2021.3116601]
Xie Y X, Yan J, Kang L, Guo Y M, Zhang J H and Luan X D. 2022. FCT: fusing CNN and Transformer for scene classification. International Journal of Multimedia Information Retrieval, 11(4): 611-618 [DOI: 10.1007/s13735-022-00252-7http://dx.doi.org/10.1007/s13735-022-00252-7]
Xin Z Q, Li Z W, Wang L Q, Xu M M, Hu Y B and Liang J. 2023. Hyperspectral image classification of Yellow River Delta wetlands based on a spectral-spatial unified Transformer model. Marine Sciences, 47(5): 90-101
辛紫麒, 李忠伟, 王雷全, 许明明, 胡亚斌, 梁建. 2023. 基于光谱—空间联合Transformer模型的黄河三角洲湿地高光谱影像分类. 海洋科学, 47(5): 90-101 [DOI: 10.11759/hykx202204290012http://dx.doi.org/10.11759/hykx202204290012]
Xu K J, Deng P F and Huang H. 2021. HSRS-SC: a hyperspectral image dataset for remote sensing scene classification. Journal of Image and Graphics, 26(8): 1809-1822
徐科杰, 邓培芳, 黄鸿. 2021. HSRS-SC: 面向遥感场景分类的高光谱图像数据集. 中国图象图形学报, 26(8): 1809-1822 [DOI: 10.11834/jig.200835http://dx.doi.org/10.11834/jig.200835]
Xu K J, Huang H and Deng P F. 2021. Attention-based deep feature learning network for scene classification of hyperspectral images//Proceedings of the 55th Asilomar Conference on Signals, Systems, and Computers. Pacific Grove, USA: IEEE: 1690-1693 [DOI: 10.1109/IEEECONF53345.2021.9723419http://dx.doi.org/10.1109/IEEECONF53345.2021.9723419]
Xu K J, Huang H, Deng P F and Li Y. 2022. Deep feature aggregation framework driven by graph convolutional network for scene classification in remote sensing. IEEE Transactions on Neural Networks and Learning Systems, 33(10): 5751-5765 [DOI: 10.1109/TNNLS.2021.3071369http://dx.doi.org/10.1109/TNNLS.2021.3071369]
Yang Y and Newsam S. 2008. Comparing SIFT descriptors and Gabor texture features for classification of remote sensed imagery//Proceedings of the 15th IEEE International Conference on Image Processing. San Diego, USA: IEEE: 1852-1855 [DOI: 10.1109/ICIP.2008.4712139http://dx.doi.org/10.1109/ICIP.2008.4712139]
Yao X W, Han J W, Cheng G, Qian X M and Guo L. 2016. Semantic annotation of high-resolution satellite images via weakly supervised learning. IEEE Transactions on Geoscience and Remote Sensing, 54(6): 3660-3671 [DOI: 10.1109/TGRS.2016.2523563http://dx.doi.org/10.1109/TGRS.2016.2523563]
Yu D H, Xu Q, Zhao C, Guo H T, Lu J, Lin Y Z and Liu X Y. 2023. Attention-guided feature fusion and joint learning for remote sensing image scene classification. Acta Geodaetica et Cartographica Sinica, 52(4): 624-637
余东行, 徐青, 赵传, 郭海涛, 卢俊, 林雨准, 刘相云. 2023. 注意力引导特征融合与联合学习的遥感影像场景分类. 测绘学报, 52(4): 624-637 [DOI: 10.11947/j.AGCS.2023.20210659http://dx.doi.org/10.11947/j.AGCS.2023.20210659]
Yu H Y, Xu Z, Zheng K, Hong D F, Yang H and Song M P. 2022. MSTNet: a multilevel spectral-spatial Transformer network for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 60: #5532513 [DOI: 10.1109/TGRS.2022.3186400http://dx.doi.org/10.1109/TGRS.2022.3186400]
Yu T W, Zheng E R, Shen J G and Wang K. 2022. Optical remote sensing image scene classification based on multi-level cross-layer bilinear fusion. Acta Photonica Sinica, 51(2): #0210007
余甜微, 郑恩让, 沈钧戈, 王凯. 2022. 基于多级别跨层双线性融合的光学遥感图像场景分类. 光子学报, 51(2): #0210007 [DOI: 10.3788/gzxb20225102.0210007http://dx.doi.org/10.3788/gzxb20225102.0210007]
Yuan Y, Fang J, Lu X Q and Feng Y C. 2019. Remote sensing image scene classification using rearranged local features. IEEE Transactions on Geoscience and Remote Sensing, 57(3): 1779-1792 [DOI: 10.1109/TGRS.2018.2869101http://dx.doi.org/10.1109/TGRS.2018.2869101]
Zhang Y C, Zheng X T, Lu X Q. 2023. Hyperspectral image classification method based on hierarchical Transformer network. Acta Geodaetica et Cartographica Sinica, 52(7): 1139-1147
张艺超, 郑向涛, 卢孝强. 2023. 基于层级Transformer的高光谱图像分类方法. 测绘学报, 52(7): 1139-1147 [DOI: 10.11947/j.AGCS.2023.20220540http://dx.doi.org/10.11947/j.AGCS.2023.20220540]
Zhao X M, Wu J and Chen R X. 2021. RMFS-CNN: new deep learning framework for remote sensing image classification. Journal of Image and Graphics, 26(2): 297-304
赵雪梅, 吴军, 陈睿星. 2021. RMFS-CNN: 遥感图像分类深度学习新框架. 中国图象图形学报, 26(2): 297-304 [DOI: 10.11834/jig.200397http://dx.doi.org/10.11834/jig.200397]
Zhou W X, Shao Z F, Diao C Y and Cheng Q M. 2015. High-resolution remote-sensing imagery retrieval using sparse features by auto-encoder. Remote Sensing Letters, 6(10): 775-783 [DOI: 10.1080/2150704X.2015.1074756http://dx.doi.org/10.1080/2150704X.2015.1074756]
相关作者
相关机构