自监督学习下小样本遥感图像场景分类
Self-supervised learning based few-shot remote sensing scene image classification
- 2022年27卷第11期 页码:3371-3381
纸质出版日期: 2022-11-16 ,
录用日期: 2021-10-24
DOI: 10.11834/jig.210486
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2022-11-16 ,
录用日期: 2021-10-24
移动端阅览
张睿, 杨义鑫, 李阳, 王家宝, 苗壮, 李航, 王梓祺. 自监督学习下小样本遥感图像场景分类[J]. 中国图象图形学报, 2022,27(11):3371-3381.
Rui Zhang, Yixin Yang, Yang Li, Jiabao Wang, Zhuang Miao, Hang Li, Ziqi Wang. Self-supervised learning based few-shot remote sensing scene image classification[J]. Journal of Image and Graphics, 2022,27(11):3371-3381.
目的
2
卷积神经网络(convolutional neural network,CNN)在遥感场景图像分类中广泛应用,但缺乏训练数据依然是不容忽视的问题。小样本遥感场景分类是指模型只需利用少量样本训练即可完成遥感场景图像分类任务。虽然现有基于元学习的小样本遥感场景图像分类方法可以摆脱大数据训练的依赖,但模型的泛化能力依然较弱。为了解决这一问题,本文提出一种基于自监督学习的小样本遥感场景图像分类方法来增加模型的泛化能力。
方法
2
本文方法分为两个阶段。首先,使用元学习训练老师网络直到收敛;然后,双学生网络和老师网络对同一个输入进行预测。老师网络的预测结果会通过蒸馏损失指导双学生网络的训练。另外,在图像特征进入分类器之前,自监督对比学习通过度量同类样本的类中心距离,使模型学习到更明确的类间边界。两种自监督机制能够使模型学习到更丰富的类间关系,从而提高模型的泛化能力。
结果
2
本文在NWPU-RESISC45(North Western Polytechnical University-remote sensing image scene classification)、AID(aerial image dataset)和UCMerced LandUse(UC merced land use dataset)3个数据集上进行实验。在5-way 1-shot条件下,本文方法的精度在3个数据集上分别达到了72.72%±0.15%、68.62%±0.76%和68.21%±0.65%,比Relation Net
*
模型分别提高了4.43%、1.93%和0.68%。随着可用标签的增加,本文方法的提升作用依然能够保持,在5-way 5-shot条件下,本文方法的精度比Relation Net
*
分别提高3.89%、2.99%和1.25%。
结论
2
本文方法可以使模型学习到更丰富的类内类间关系,有效提升小样本遥感场景图像分类模型的泛化能力。
Objective
2
Convolutional neural networks (CNNs) have been widely used in remote sensing scene image classification
but data-driven models are restricted by the data scarcity-related over fitting and low robustness issue. The problems of few labeled samples are still challenged to train model for remote sensing scene image classification task. Therefore
it is required to design an effective algorithm that can adapt to small-scale data. Few-shot learning can be used to improve the generalization ability of model. Current meta-learning-based few-shot remote sensing scene image classification methods can resilient the data-intensive with no higher robustness. A challenging issue of the remote sensing scene samples is derived of small inter-class variation and large intra-class variation
which may lead to low robustness for few-shot learning. Our research is focused on a novel self-supervised learning framework for few-shot remote sensing scene image classification
which can improve the generalization ability of the model via rich intra-class relationships learnt.
Method
2
Our self-supervised learning framework is composed of three modules in relation to data preprocessing
feature extraction and loss function. 1) Data preprocessing module is implemented for resizing and normalization for all inputs
and the supporting set and the query set are constructed for few-shot learning. The supporting set is concerned about small scale labeled images
but the query set has no labels-relevant samples. Few-shot learning method attempts to classify the query samples of using same group-derived supporting set. Furthermore
data preprocessing module can construct a numerous of multiple supporting sets and query sets. 2) Feature extraction module is aimed to extract the features from the inputs
consisting of the supporting features and the query features. The distilled "student-related" knowledge has dual-based feature extraction networks. The "teacher-related" feature extraction module is based on ResNet-50
and the "student-related" dual module has two Conv-64 networks. 3) Loss function module can produce three losses-relevant like few-shot
knowledge distillation and self-supervised contrast. The few-shot loss uses the inherent labels to update the parameters of the "student-related" network
which is produced by metric-based meta-learning. Knowledge-distilled loss is originated from KL (Kullback-Leibler) loss
which calculates the similarity of probability distribution between the "student-related" dual networks and the teachers-related network using the soft labels. The knowledge distillation learning is based on two-stage training process. The "teacher-related" network is used for metric based meta-learning. Then
the "student-related" networks and the "teacher-related" network are trained with the same data
and the output of the "teacher-related" network is used to guide the learning of the "student-related" network by knowledge distillation loss. Additionally
the self-supervised contrastive loss is calculated by measuring the distance between the centers of two classes. We use the self-supervised contrastive loss to perform instance discrimination pretext task through reducing the distances from same classes
and amplifying the different ones. The two self-supervising mechanisms can enable the model to learn richer inter-class relationships
which can improve the generalization ability.
Result
2
Our method is evaluated on North Western Polytechnical University-remote sensing image scene classification (NWPU-RESISC45) dataset
aerial image dataset (AID)
and UC merced land use dataset (UCMerced LandUse)
respectively. The 5-way 1-shot task and 5-way 5-shot task is carried out on each dataset. Our method is also compared to other five methods
and our benchmark is Relation Net
*
which is a metric-based meta-learning method. For the 5-way 1-shot task
it can achieve 72.72%±0.15%
68.62%±0.76%
and 68.21%±0.65% on the three datasets
respectively
which is 4.43%
1.93%
and 0.68% higher than Relation Net
*
. For the 5-way 5-shot task
our result is 3.89%
2.99%
and 1.25% higher than Relation Net
*
. The confusion matrix is visualized on the AID and UCMerced LandUse as well. The confusion matrix shows that our self-supervised method can reduce the error outputs from the indistinguishable classes.
Conclusion
2
We develop a self-supervised method to resolve the data scarcity-derived problem of low robustness
which consists of a dual-based "student-related" knowledge distillation mechanism and a self-supervised contrastive learning mechanism. Dual-based "student-related" knowledge distillation uses the soft labels of the "teacher-related" network as the supervision information of the "student-related" network
which can improve the robustness of few-shot learning through richer inter-class relationship and intra-class relationship. The self-supervised contrastive learning method can evaluate the similarity of different class center in a representation space
making the model to learn a class center better. The feasibility of self-supervised distillation and contrastive learning is clarified. It is necessary to integrate self-supervised transfer learning tasks with few-shot remote sensing scene image classification further.
小样本学习遥感场景分类自监督学习蒸馏学习对比学习
few-shot learningremote sensing scene classificationself-supervised learningdistillation learningcontrastive learning
Anwer R M, Khan F S, van de Weijer J, Molinier M and Laaksonen J. 2018. Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification. ISPRS Journal of Photogrammetry and Remote Sensing, 138: 74-85[DOI: 10.1016/j.isprsjprs.2018.01.023]
Bertinetto L, Henriques J F, Torr P and Vedaldi A. 2019. Meta-learning with differentiable closed-form solvers//Proceedings of the 7th International Conference on Learning Representations. New Orleans, USA: ICLR
Chen X L, Fan H Q, Girshick R and He K M. 2020. Improved baselines with momentum contrastive learning[EB/OL]. [2020-03-10].https://arxiv.org/pdf/2003.04297.pdfhttps://arxiv.org/pdf/2003.04297.pdf
Cheng G, Han J W and Lu X Q. 2017. Remote sensing image scene classification: benchmark and state of the art. Proceedings of the IEEE, 105(10): 1865-1883[DOI: 10.1109/JPROC.2017.2675998]
Cheng G, Han J W, Zhou P C and Guo L. 2014. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS Journal of Photogrammetry and Remote Sensing, 98: 119-132[DOI: 10.1016/j.isprsjprs.2014.10.002]
Cheng G, Xie X X, Han J W, Guo L and Xia G S. 2020. Remote sensing image scene classification meets deep learning: challenges, methods, benchmarks, and opportunities. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13: 3735-3756[DOI: 10.1109/JSTARS.2020.3005403]
Dong R C, Xu D Z, Jiao L C, Zhao J and An J G. 2020. A fast deep perception network for remote sensing scene classification. Remote Sensing, 12(4): #729[DOI: 10.3390/rs12040729]
Finn C, Abbeel P and Levine S. 2017. Model-agnostic meta-learning for fast adaptation of deep networks//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR. org: 1126-1135
Gou J P, Yu B S, Maybank S J and Tao D C. 2021. Knowledge distillation: a survey. International Journal of Computer Vision, 129(6): 1789-1819[DOI: 10.1007/s11263-021-01453-z]
He K M, Zhang X Y, Ren S Q and Sun J. 2015. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hendrycks D, Mazeika M, Kadavath S and Song D. 2019. Using self-supervised learning can improve model robustness and uncertainty//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: NIPS: #1403
Ji D C, Jiang Y Z and Wang S T. 2019. Multi-source transfer learning method by balancing both the domains and instances. Acta Electronica Sinica, 47(3): 692-699
季鼎城, 蒋亦樟, 王士同. 2019. 基于域与样例平衡的多源迁移学习方法. 电子学报, 47(3): 692-699[DOI: 10.3969/j.issn.0372-2112.2019.03.025]
Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc. : 1097-1105
Lee K, Maji S, Ravichandran A and Soatto S. 2019. Meta-learning with differentiable convex optimization//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 10649-10657[DOI: 10.1109/CVPR.2019.01091http://dx.doi.org/10.1109/CVPR.2019.01091]
Li H F, Cui Z Q, Zhu Z Q, Chen L, Zhu J W, Huang H Z and Tao C. 2020. RS-MetaNet: deep meta metric learning for few-shot remote sensing scene classification[EB/OL]. [2020-09-28].https://arxiv.org/pdf/2009.13364.pdfhttps://arxiv.org/pdf/2009.13364.pdf
Liu Y, Lei Y B, Fan J L, Wang F P, Gong Y C and Tian Q. 2021. Survey on image classification technology based on small sample learning. Acta Automatica Sinica, 47(2): 297-315
刘颖, 雷研博, 范九伦, 王富平, 公衍超, 田奇. 2021. 基于小样本学习的图像分类技术综述. 自动化学报, 47(2): 297-315[DOI: 10.16383/j.aas.c190720]
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR
Snell J, Swersky K and Zemel R. 2017. Prototypical networks for few-shot learning//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 4080-4090
Sun Q R, Liu Y Y, Chua T S and Schiele B. 2019. Meta-transfer learning for few-shot learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 403-412[DOI: 10.1109/CVPR.2019.00049http://dx.doi.org/10.1109/CVPR.2019.00049]
Sung F, Yang Y X, Zhang L, Xiang T, Torr P H S and Hospedales T M. 2018. Learning to compare: relation network for few-shot learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1199-1208[DOI: 10.1109/CVPR.2018.00131http://dx.doi.org/10.1109/CVPR.2018.00131]
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1-9[DOI: 10.1109/CVPR.2015.7298594http://dx.doi.org/10.1109/CVPR.2015.7298594]
Tao C, Lu W P, Qi J and Wang H. 2021. Spatial information considered network for scene classification. IEEE Geoscience and Remote Sensing Letters, 18(6): 984-988[DOI: 10.1109/LGRS.2020.2992929]
van den Oord A, Li Y Z and Vinyals O. 2018. Representation learning with contrastive predictive coding[EB/OL]. [2020-08-13].https://arxiv.org/pdf/1807.03748.pdfhttps://arxiv.org/pdf/1807.03748.pdf
Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K and Wierstra D. 2016. Matching networks for one shot learning//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: NIPS: 3637-3645
Wang Y B, Zhang L Q, Deng H, Lu J W, Huang H Y, Zhang L, Liu J, Tang H and Xing X Y. 2017. Learning a discriminative distance metric with label consistency for scene classification. IEEE Transactions on Geoscience and Remote Sensing, 55(8): 4427-4440[DOI: 10.1109/TGRS.2017.2692280]
Wang Y Q, Yao Q M, Kwok J T and Ni L M. 2021. Generalizing from a few examples: a survey on few-shot learning. ACM Computing Surveys, 53(3): #63[DOI: 10.1145/3386252]
Xia G S, Hu J W, Hu F, Shi B G, Bai X, Zhong Y F, Zhang L P and Lu X Q. 2017. AID: a benchmark data set for performance evaluation of aerial scene classification. IEEE Transactions on Geoscience and Remote Sensing, 55(7): 3965-3981[DOI: 10.1109/TGRS.2017.2685945]
Xu P B, Sang J T and Lu D Y. 2021. Few shot image recognition based on class semantic similarity supervision. Journal of Image and Graphics, 26(7): 1594-1603
徐鹏帮, 桑基韬, 路冬媛. 2021. 类别语义相似性监督的小样本图像识别. 中国图象图形学报, 26(7): 1594-1603
Yang Y and Newsam S. 2010. Bag-of-visual-words and spatial extensions for land-use classification//Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. San Jose, USA: ACM: 270-279[DOI: 10.1145/1869790.1869829http://dx.doi.org/10.1145/1869790.1869829]
Yang Y X, Li Y, Zhang R, Wang J B and Miao Z. 2020. Robust compare network for few-shot learning. IEEE Access, 8: 137966-137974[DOI: 10.1109/ACCESS.2020.3012720]
Zhu Q Q, Zhong Y F, Zhang L P and Li D R. 2017. Scene classification based on the fully sparse semantic topic model. IEEE Transactions on Geoscience and Remote Sensing, 55(10): 5525-5538[DOI: 10.1109/TGRS.2017.2709802]
相关作者
相关机构