自监督学习下小样本遥感图像场景分类

张睿; 杨义鑫; 李阳; 王家宝; 苗壮; 李航; 王梓祺

doi:10.11834/jig.210486

遥感图像处理 | 浏览量 : 0 下载量: 3537 CSCD: 8

PDF
导出
分享
收藏
专辑

自监督学习下小样本遥感图像场景分类
Self-supervised learning based few-shot remote sensing scene image classification
2022年27卷第11期页码：3371-3381
收稿日期：2021-06-23，

修回日期：2021-10-17，

录用日期：2021-10-24，

纸质出版日期：2022-11-16
DOI： 10.11834/jig.210486
稿件说明：

移动端阅览

张睿, 杨义鑫, 李阳, 王家宝, 苗壮, 李航, 王梓祺. 自监督学习下小样本遥感图像场景分类[J]. 中国图象图形学报, 2022,27(11):3371-3381. DOI： 10.11834/jig.210486.

Rui Zhang, Yixin Yang, Yang Li, Jiabao Wang, Zhuang Miao, Hang Li, Ziqi Wang. Self-supervised learning based few-shot remote sensing scene image classification[J]. Journal of image and graphics, 2022, 27(11): 3371-3381. DOI： 10.11834/jig.210486.

摘要

目的

卷积神经网络（convolutional neural network，CNN）在遥感场景图像分类中广泛应用，但缺乏训练数据依然是不容忽视的问题。小样本遥感场景分类是指模型只需利用少量样本训练即可完成遥感场景图像分类任务。虽然现有基于元学习的小样本遥感场景图像分类方法可以摆脱大数据训练的依赖，但模型的泛化能力依然较弱。为了解决这一问题，本文提出一种基于自监督学习的小样本遥感场景图像分类方法来增加模型的泛化能力。

方法

本文方法分为两个阶段。首先，使用元学习训练老师网络直到收敛；然后，双学生网络和老师网络对同一个输入进行预测。老师网络的预测结果会通过蒸馏损失指导双学生网络的训练。另外，在图像特征进入分类器之前，自监督对比学习通过度量同类样本的类中心距离，使模型学习到更明确的类间边界。两种自监督机制能够使模型学习到更丰富的类间关系，从而提高模型的泛化能力。

结果

本文在NWPU-RESISC45（North Western Polytechnical University-remote sensing image scene classification）、AID（aerial image dataset）和UCMerced LandUse（UC merced land use dataset）3个数据集上进行实验。在5-way 1-shot条件下，本文方法的精度在3个数据集上分别达到了72.72%±0.15%、68.62%±0.76%和68.21%±0.65%，比Relation Net

模型分别提高了4.43%、1.93%和0.68%。随着可用标签的增加，本文方法的提升作用依然能够保持，在5-way 5-shot条件下，本文方法的精度比Relation Net

分别提高3.89%、2.99%和1.25%。

结论

本文方法可以使模型学习到更丰富的类内类间关系，有效提升小样本遥感场景图像分类模型的泛化能力。

Abstract

Objective

Convolutional neural networks (CNNs) have been widely used in remote sensing scene image classification

but data-driven models are restricted by the data scarcity-related over fitting and low robustness issue. The problems of few labeled samples are still challenged to train model for remote sensing scene image classification task. Therefore

it is required to design an effective algorithm that can adapt to small-scale data. Few-shot learning can be used to improve the generalization ability of model. Current meta-learning-based few-shot remote sensing scene image classification methods can resilient the data-intensive with no higher robustness. A challenging issue of the remote sensing scene samples is derived of small inter-class variation and large intra-class variation

which may lead to low robustness for few-shot learning. Our research is focused on a novel self-supervised learning framework for few-shot remote sensing scene image classification

which can improve the generalization ability of the model via rich intra-class relationships learnt.

Method

Our self-supervised learning framework is composed of three modules in relation to data preprocessing

feature extraction and loss function. 1) Data preprocessing module is implemented for resizing and normalization for all inputs

and the supporting set and the query set are constructed for few-shot learning. The supporting set is concerned about small scale labeled images

but the query set has no labels-relevant samples. Few-shot learning method attempts to classify the query samples of using same group-derived supporting set. Furthermore

data preprocessing module can construct a numerous of multiple supporting sets and query sets. 2) Feature extraction module is aimed to extract the features from the inputs

consisting of the supporting features and the query features. The distilled "student-related" knowledge has dual-based feature extraction networks. The "teacher-related" feature extraction module is based on ResNet-50

and the "student-related" dual module has two Conv-64 networks. 3) Loss function module can produce three losses-relevant like few-shot

knowledge distillation and self-supervised contrast. The few-shot loss uses the inherent labels to update the parameters of the "student-related" network

which is produced by metric-based meta-learning. Knowledge-distilled loss is originated from KL (Kullback-Leibler) loss

which calculates the similarity of probability distribution between the "student-related" dual networks and the teachers-related network using the soft labels. The knowledge distillation learning is based on two-stage training process. The "teacher-related" network is used for metric based meta-learning. Then

the "student-related" networks and the "teacher-related" network are trained with the same data

and the output of the "teacher-related" network is used to guide the learning of the "student-related" network by knowledge distillation loss. Additionally

the self-supervised contrastive loss is calculated by measuring the distance between the centers of two classes. We use the self-supervised contrastive loss to perform instance discrimination pretext task through reducing the distances from same classes

and amplifying the different ones. The two self-supervising mechanisms can enable the model to learn richer inter-class relationships

which can improve the generalization ability.

Result

Our method is evaluated on North Western Polytechnical University-remote sensing image scene classification (NWPU-RESISC45) dataset

aerial image dataset (AID)

and UC merced land use dataset (UCMerced LandUse)

respectively. The 5-way 1-shot task and 5-way 5-shot task is carried out on each dataset. Our method is also compared to other five methods

and our benchmark is Relation Net

which is a metric-based meta-learning method. For the 5-way 1-shot task

it can achieve 72.72%±0.15%

68.62%±0.76%

and 68.21%±0.65% on the three datasets

respectively

which is 4.43%

1.93%

and 0.68% higher than Relation Net

. For the 5-way 5-shot task

our result is 3.89%

2.99%

and 1.25% higher than Relation Net

. The confusion matrix is visualized on the AID and UCMerced LandUse as well. The confusion matrix shows that our self-supervised method can reduce the error outputs from the indistinguishable classes.

Conclusion

We develop a self-supervised method to resolve the data scarcity-derived problem of low robustness

which consists of a dual-based "student-related" knowledge distillation mechanism and a self-supervised contrastive learning mechanism. Dual-based "student-related" knowledge distillation uses the soft labels of the "teacher-related" network as the supervision information of the "student-related" network

which can improve the robustness of few-shot learning through richer inter-class relationship and intra-class relationship. The self-supervised contrastive learning method can evaluate the similarity of different class center in a representation space

making the model to learn a class center better. The feasibility of self-supervised distillation and contrastive learning is clarified. It is necessary to integrate self-supervised transfer learning tasks with few-shot remote sensing scene image classification further.

关键词

Keywords

references

Anwer R M, Khan F S, van de Weijer J, Molinier M and Laaksonen J. 2018. Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification. ISPRS Journal of Photogrammetry and Remote Sensing, 138: 74-85[DOI: 10.1016/j.isprsjprs.2018.01.023]

Bertinetto L, Henriques J F, Torr P and Vedaldi A. 2019. Meta-learning with differentiable closed-form solvers//Proceedings of the 7th International Conference on Learning Representations. New Orleans, USA: ICLR

Chen X L, Fan H Q, Girshick R and He K M. 2020. Improved baselines with momentum contrastive learning[EB/OL]. [2020-03-10] . https://arxiv.org/pdf/2003.04297.pdf https://arxiv.org/pdf/2003.04297.pdf

Cheng G, Han J W and Lu X Q. 2017. Remote sensing image scene classification: benchmark and state of the art. Proceedings of the IEEE, 105(10): 1865-1883[DOI: 10.1109/JPROC.2017.2675998]

Cheng G, Han J W, Zhou P C and Guo L. 2014. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS Journal of Photogrammetry and Remote Sensing, 98: 119-132[DOI: 10.1016/j.isprsjprs.2014.10.002]

Cheng G, Xie X X, Han J W, Guo L and Xia G S. 2020. Remote sensing image scene classification meets deep learning: challenges, methods, benchmarks, and opportunities. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13: 3735-3756[DOI: 10.1109/JSTARS.2020.3005403]

Dong R C, Xu D Z, Jiao L C, Zhao J and An J G. 2020. A fast deep perception network for remote sensing scene classification. Remote Sensing, 12(4): #729[DOI: 10.3390/rs12040729]

Finn C, Abbeel P and Levine S. 2017. Model-agnostic meta-learning for fast adaptation of deep networks//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR. org: 1126-1135

Gou J P, Yu B S, Maybank S J and Tao D C. 2021. Knowledge distillation: a survey. International Journal of Computer Vision, 129(6): 1789-1819[DOI: 10.1007/s11263-021-01453-z]

He K M, Zhang X Y, Ren S Q and Sun J. 2015. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]

Hendrycks D, Mazeika M, Kadavath S and Song D. 2019. Using self-supervised learning can improve model robustness and uncertainty//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: NIPS: #1403

Ji D C, Jiang Y Z and Wang S T. 2019. Multi-source transfer learning method by balancing both the domains and instances. Acta Electronica Sinica, 47(3): 692-699

季鼎城, 蒋亦樟, 王士同. 2019. 基于域与样例平衡的多源迁移学习方法. 电子学报, 47(3): 692-699[DOI: 10.3969/j.issn.0372-2112.2019.03.025]

Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc. : 1097-1105

Lee K, Maji S, Ravichandran A and Soatto S. 2019. Meta-learning with differentiable convex optimization//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 10649-10657[ DOI: 10.1109/CVPR.2019.01091 http://dx.doi.org/10.1109/CVPR.2019.01091 ]

Li H F, Cui Z Q, Zhu Z Q, Chen L, Zhu J W, Huang H Z and Tao C. 2020. RS-MetaNet: deep meta metric learning for few-shot remote sensing scene classification[EB/OL]. [2020-09-28] . https://arxiv.org/pdf/2009.13364.pdf https://arxiv.org/pdf/2009.13364.pdf

Liu Y, Lei Y B, Fan J L, Wang F P, Gong Y C and Tian Q. 2021. Survey on image classification technology based on small sample learning. Acta Automatica Sinica, 47(2): 297-315

刘颖, 雷研博, 范九伦, 王富平, 公衍超, 田奇. 2021. 基于小样本学习的图像分类技术综述. 自动化学报, 47(2): 297-315[DOI: 10.16383/j.aas.c190720]

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR

Snell J, Swersky K and Zemel R. 2017. Prototypical networks for few-shot learning//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 4080-4090

Sun Q R, Liu Y Y, Chua T S and Schiele B. 2019. Meta-transfer learning for few-shot learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 403-412[ DOI: 10.1109/CVPR.2019.00049 http://dx.doi.org/10.1109/CVPR.2019.00049 ]

Sung F, Yang Y X, Zhang L, Xiang T, Torr P H S and Hospedales T M. 2018. Learning to compare: relation network for few-shot learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1199-1208[ DOI: 10.1109/CVPR.2018.00131 http://dx.doi.org/10.1109/CVPR.2018.00131 ]

Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1-9[ DOI: 10.1109/CVPR.2015.7298594 http://dx.doi.org/10.1109/CVPR.2015.7298594 ]

Tao C, Lu W P, Qi J and Wang H. 2021. Spatial information considered network for scene classification. IEEE Geoscience and Remote Sensing Letters, 18(6): 984-988[DOI: 10.1109/LGRS.2020.2992929]

van den Oord A, Li Y Z and Vinyals O. 2018. Representation learning with contrastive predictive coding[EB/OL]. [2020-08-13] . https://arxiv.org/pdf/1807.03748.pdf https://arxiv.org/pdf/1807.03748.pdf

Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K and Wierstra D. 2016. Matching networks for one shot learning//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: NIPS: 3637-3645

Wang Y B, Zhang L Q, Deng H, Lu J W, Huang H Y, Zhang L, Liu J, Tang H and Xing X Y. 2017. Learning a discriminative distance metric with label consistency for scene classification. IEEE Transactions on Geoscience and Remote Sensing, 55(8): 4427-4440[DOI: 10.1109/TGRS.2017.2692280]

Wang Y Q, Yao Q M, Kwok J T and Ni L M. 2021. Generalizing from a few examples: a survey on few-shot learning. ACM Computing Surveys, 53(3): #63[DOI: 10.1145/3386252]

Xia G S, Hu J W, Hu F, Shi B G, Bai X, Zhong Y F, Zhang L P and Lu X Q. 2017. AID: a benchmark data set for performance evaluation of aerial scene classification. IEEE Transactions on Geoscience and Remote Sensing, 55(7): 3965-3981[DOI: 10.1109/TGRS.2017.2685945]

Xu P B, Sang J T and Lu D Y. 2021. Few shot image recognition based on class semantic similarity supervision. Journal of Image and Graphics, 26(7): 1594-1603

徐鹏帮, 桑基韬, 路冬媛. 2021. 类别语义相似性监督的小样本图像识别. 中国图象图形学报, 26(7): 1594-1603

Yang Y and Newsam S. 2010. Bag-of-visual-words and spatial extensions for land-use classification//Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. San Jose, USA: ACM: 270-279[ DOI: 10.1145/1869790.1869829 http://dx.doi.org/10.1145/1869790.1869829 ]

Yang Y X, Li Y, Zhang R, Wang J B and Miao Z. 2020. Robust compare network for few-shot learning. IEEE Access, 8: 137966-137974[DOI: 10.1109/ACCESS.2020.3012720]

Zhu Q Q, Zhong Y F, Zhang L P and Li D R. 2017. Scene classification based on the fully sparse semantic topic model. IEEE Transactions on Geoscience and Remote Sensing, 55(10): 5525-5538[DOI: 10.1109/TGRS.2017.2709802]

文章被引用时，请邮件提醒。

提交

多层自适应聚合的自监督小样本图像分类

一致性约束引导的零样本三维模型分类网络

结合双重对比嵌入学习的生成式零样本图像识别

结合知识蒸馏与互信息的多模态MRI疾病预后

空—地多视角行为识别的判别信息增量学习方法