噪声鲁棒的轻量级深度遥感场景图像分类检索

王亚鹏; 李阳; 王家宝; 赵勋; 苗壮

doi:10.11834/jig.200538

遥感图像处理 | 浏览量 : 0 下载量: 267 CSCD: 0

PDF
导出
分享
收藏
专辑

噪声鲁棒的轻量级深度遥感场景图像分类检索
A robust lightweight deep learning method for remote sensing scene image classification and retrieval under label noise
2021年26卷第12期页码：2991-3004
收稿：2020-09-14，

修回：2020-12-29，

录用：2021-1-5，

纸质出版：2021-12-16
DOI： 10.11834/jig.200538
稿件说明：

移动端阅览

王亚鹏, 李阳, 王家宝, 赵勋, 苗壮. 噪声鲁棒的轻量级深度遥感场景图像分类检索[J]. 中国图象图形学报, 2021,26(12):2991-3004. DOI： 10.11834/jig.200538.

Yapeng Wang, Yang Li, Jiabao Wang, Xun Zhao, Zhuang Miao. A robust lightweight deep learning method for remote sensing scene image classification and retrieval under label noise[J]. Journal of Image and Graphics, 2021, 26(12): 2991-3004. DOI： 10.11834/jig.200538.

摘要

目的

基于深度神经网络的遥感图像处理方法在训练过程中往往需要大量准确标注的数据，一旦标注数据中存在标签噪声，将导致深度神经网络性能显著降低。为了解决噪声造成的性能下降问题，提出了一种噪声鲁棒的轻量级深度遥感场景图像分类检索方法，能够同时完成分类和哈希检索任务，有效提高深度神经网络在有标签噪声遥感数据上的分类和哈希检索性能。

方法

选取轻量级神经网络作为骨干网，而后设计能够同时完成分类和哈希检索任务的双分支结构，最后通过设置损失基准的正则化方法，有效减轻模型对噪声的过拟合，得到噪声鲁棒的分类检索模型。

结果

本文在两个公开遥感场景数据集上进行分类测试，并与8种方法进行比较。本文方法在AID（aerial image datasets）数据集上，所有噪声比例下的分类精度比次优方法平均高出7.8%，在NWPU-RESISC45（benchmark created by Northwestern Polytechnical University for remote sensing image scene classification covering 45 scene classes）数据集上，分类精度比次优方法平均高出8.1%。在效率方面，本文方法的推理速度比CLEOT（classification loss with entropic optimal transport）方法提升了2.8倍，而计算量和参数量均不超过CLEOT方法的5%。在遥感图像哈希检索任务中，在AID数据集上，本文方法的平均精度均值（mean average precision，mAP）在3种不同哈希比特下比MiLaN（metric-learning based deep hashing network）方法平均提高了5.9%。

结论

本文方法可以同时完成遥感图像分类和哈希检索任务，在保持模型轻量高效的情况下，有效提升了深度神经网络在有标签噪声遥感数据上的鲁棒性。

Abstract

Objective

With the development of deep learning technology

deep neural networks have been widely used in various tasks of remote sensing

such as image retrieval

scene classification and change detection. Although these deep learning methods constantly refresh the accuracy of remote sensing applications on specific datasets

they require massive data with millions of reliable annotations

which are impractical or expensive for real-world applications. In contrast

when the accuracy of labels is too low

the performance of these deep learning methods will decline sharply. In order to reduce the labeling cost and improve the labeling speed

researchers have proposed a variety of greedy annotation methods to improve labeling efficiency via clustering and crowd sourcing information. The performance of deep learning methods will decline dramatically once the label noise is introduced into the dataset. It is necessary to construct a noise robust deep learning method for remote sensing image processing to improve generalization performance. A noise robust and lightweight deep learning method for remote sensing scene classification and retrieval have been proposed to resolve performance degradation

which can effectively improve the classification and hash retrieval performance on remote sensing dataset under label noise. Furthermore

the proposed method can complete classification and hash retrieval tasks at the same time.

Method

First

a lightweight deep neural network named mobile GPU-aware network C (MoGA-C) as the backbone has been used to keep the lightweight of deep learning model

which has been proposed by Xiaomi AI Lab. MoGA-C has been obtained based on mobile GPU-aware (MoGA) neural network structure search algorithm. Various skills of lightweight network design have been integrated to ensure the lightweight of the network in the process of MoGA-C network design. Next

a double-branch structure behind deep neural network has been performed to the tasks of classification and retrieval simultaneously

which can not only avoid the degradation of classification performance caused by the insertion of hash layer

but also effectively increase the classification accuracy under label noise by integrating the results of double-branch. At last

the whole network has been fine-tuned during training process in order to improve the learning ability of deep neural network

which effectively improved the classification performance under low ratio label noise. A loss benchmark in the process of network fine-tuning has been set to reduce over-fitting to label noise in the middle and later stage of training

which limited the lower boundary of training loss and reduced the over-fitting under high ratio noise effectively.

Result

The proposed method has been evaluated via comparing it with other eight state of the art methods on two public remote sensing classification datasets. The research method has performed well under different noise ratios

which is 7.8% higher than sub-optimal method on aerial image datasets (AID) dataset and 8.1% higher on benchmark created by Northwestern Polytechnical University for remote sensing image scene classification covering 45 scene classes (NWPU-RESISC45) dataset in average. The inference speed has reached 2.8 times faster than the classification loss with entropic optimal transport (CLEOT) method. The floating point operations (FLOPs) and parameters are less than 5% of that in CLEOT method. The method has 5.9% average improvement under three different hash bits compared with the metric-learning based deep hashing network (MiLaN) method on AID dataset in the task of remote sensing image retrieval.

Conclusion

A lightweight and noise robust method for remote sensing scene classification and retrieval has been demonstrated to resolve the problem of performance degradation of remote sensing image processing methods under label noise. The proposed method can perform the tasks of classification and hash retrieval at the same time and improve the classification and retrieval performance under label noise effectively. First of all

a lightweight network has been opted as the backbone to ensure the lightweight of the model. Secondly

a parallel double-branch structure has been designed in order to complete the classification and hash retrieval tasks at the same time

the classification performance of the model has been improved further via combining the double-branch prediction results. Finally

the training loss has subjected to a positive value to reduce the over-fitting of label noise effectively via setting a loss benchmark. To compare with other methods

the classification and hash retrieval experiments have been conducted on two public datasets. The experimental results have presented that the proposed method not only has high efficiency

but also has good robustness to different ratios of label noise.

关键词

Keywords

references

Cheng G, Han J W and Lu X Q. 2017. Remote sensing image scene classification: benchmark and state of the art. Proceedings of the IEEE, 105(10): 1865-1883[DOI:10.1109/JPROC.2017.2675998]

Chu X X, Zhang B and Xu R J. 2020. MoGA: searching beyond Mobilenetv3//Proceedings of ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Barcelona, Spain: IEEE: 4042-4046[ DOI: 10.1109/ICASSP40776.2020.9054428 http://dx.doi.org/10.1109/ICASSP40776.2020.9054428 ]

Damodaran B B, Flamary R, Seguy V and Courty N. 2020. An Entropic Optimal transport loss for learning deep neural networks under label noise in remote sensing images. Computer Vision and Image Understanding, 191: #102863[DOI:10.1016/j.cviu.2019.102863]

Demir B and Bruzzone L. 2016. Hashing-based scalable remote sensing image search and retrieval in large archives. IEEE Transactions on Geoscience and Remote Sensing, 54(2): 892-904[DOI:10.1109/TGRS.2015.2469138]

Dong R C, Xu D Z, Jiao L C, Zhao J and An J G. 2020. A fast deep perception network for remote sensing scene classification. Remote Sensing, 12(4): #729[DOI:10.3390/rs12040729]

Ghosh A, Kumar H and Sastry P S. 2017. Robust loss functions under label noise for deep neural networks//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI: 1919-1925

Ghosh A, Manwani N and Sastry P S. 2015. Making risk minimization tolerant to label noise. Neurocomputing, 160: 93-107[DOI:10.1016/j.neucom.2014.09.081]

Hu J, Shen L and Sun G. 2018. Squeeze-and-Excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141[ DOI: 10.1109/CVPR.2018.00745 http://dx.doi.org/10.1109/CVPR.2018.00745 ]

Huang B H, Lu K K, Audeberr N, Khalel A, Tarabalka Y, Malof J, Boulch A, Le Saux B, Collins L, Bradbury K, Lefèvre S and ElSaban M. 2018. Large-scale semantic classification: outcome of the first year of Inria aerial image labeling benchmark//Proceedings of IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium. Valencia, Spain: IEEE: 6947-6950[ DOI: 10.1109/IGARSS.2018.8518525 http://dx.doi.org/10.1109/IGARSS.2018.8518525 ]

Ishida T, Yamane I, Sakai T, Niu G and Sugiyama M. 2020. Do we need zero training loss after achieving zero training error?//Proceedings of the 37th International Conference on Machine Learning. [s. l.]: [s. n.].

Jiang L, Zhou Z Y, Leung T, Li L J and Li F F. 2018. MentorNet: learning data-driven curriculum for very deep neural networks on corrupted labels[EB/OL]. [2020-08-03] . https://export.arxiv.org/pdf/1712.05055.pdf https://export.arxiv.org/pdf/1712.05055.pdf

Jin P, Xia G S, Hu F, Lu Q K and Zhang L P. 2018. AID++: an updated version of aid on scene classification//Proceedings of IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium. Valencia, Spain: IEEE: 4721-4724[ DOI: 10.1109/IGARSS.2018.8518882 http://dx.doi.org/10.1109/IGARSS.2018.8518882 ]

Kellenberger B, Marcos D and Tuia D. 2018. Detecting mammals in UAV images: best practices to address a substantially imbalanced dataset with deep learning[EB/OL]. [2020-08-03] . https://arxiv.org/pdf/1806.11368.pdf https://arxiv.org/pdf/1806.11368.pdf

Li Y S, Zhang Y J and Zhu Z H. 2021. Error-tolerant deep learning for remote sensing image scene classification. IEEE Transactions on Cybernetics, 51(4): 1756-1768[DOI:10.1109/TCYB.2020.2989241]

Liu Y S, Liu Y B and Ding L W. 2019. Scene classification by coupling convolutional neural networks with wasserstein distance. IEEE Geoscience and Remote Sensing Letters, 16(5): 722-726[DOI:10.1109/LGRS.2018.2883310]

Masnadi-Shirazi H and Vasconcelos N. 2008. On the design of loss functions for classification: theory, robustness to outliers, and SavageBoost//Advances in Neural Information Processing Systems 21 (NIPS 2008). Vancouver, Canada: NIPS: 1049-1056

Patrini G, Rozza A, Menon A K, Nock R and Qu L Z. 2017. Making deep neural networks robust to label noise: a loss correction approach//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 2233-2241[ DOI: 10.1109/CVPR.2017.240 http://dx.doi.org/10.1109/CVPR.2017.240 ]

Reed S, Lee H, Anguelov D, Szegedy C, Erhan D and Rabinovich A. 2015. Training deep neural networks on noisy labels with bootstrapping//Accepted as a workshop contribution at ICLR 2015. San Diego, USA: ICLR

Ren M Y, Zeng W Y, Yang B and Urtasun R. 2018. Learning to reweight examples for robust deep learning//Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR: 4331-4340

Roy S, Sangineto E, Demir B and Sebe N. 2019. Metric-learning based deep hashing network for content based retrieval of remote sensing images[EB/OL]. [2020-08-03] . https://arxiv.org/pdf/1904.01258.pdf https://arxiv.org/pdf/1904.01258.pdf

Sandler M, Howard A, Zhu M L, Zhmoginov A and Chen L C. 2018. MobileNetV2: inverted residuals and linear bottlenecks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4510-4520[ DOI: 10.1109/CVPR.2018.00474 http://dx.doi.org/10.1109/CVPR.2018.00474 ]

Schroff F, Kalenichenko D and Philbin J. 2015. FaceNet: a unified embedding for face recognition and clustering//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 815-823[ DOI: 10.1109/CVPR.2015.7298682 http://dx.doi.org/10.1109/CVPR.2015.7298682 ]

Song W W, Li S T and Benediktsson J A. 2019. Deep hashing learning for visual and semantic retrieval of remote sensing images[EB/OL]. [2020-08-03] . https://arxiv.org/pdf/1909.04614.pdf https://arxiv.org/pdf/1909.04614.pdf

Vahdat A. 2017. Toward robustness against label noise in training deep discriminative neural networks//Advances in Neural Information Processing Systems 30 (NIPS 2017). Long Beach, USA: NIPS: 5596-5605

van Rooyen B, Menon A and Williamson R C. 2015. Learning with symmetric label noise: the importance of being unhinged//Advances in Neural Information Processing Systems 28 (NIPS 2015). Montreal, Canada: NIPS: 10-18

Xia G S, Hu J W, Hu F, Shi B G, Bai X, Zhong Y F, Zhang L P and Lu X P. 2017. AID: a benchmark data set for performance evaluation of aerial scene classification. IEEE Transactions on Geoscience and Remote Sensing, 55(7): 3965-3981[DOI:10.1109/TGRS.2017.2685945]

Xia G S, Wang Z F, Xiong C M and Zhang L P. 2015. Accurate annotation of remote sensing images via active spectral clustering with little expert knowledge. Remote Sensing, 7(11): 15014-15045[DOI:10.3390/rs71115014]

Xiao T, Xia T, Yang Y, Huang C and Wang X G. 2015. Learning from massive noisy labeled data for image classification//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 2691-2699[ DOI: 10.1109/CVPR.2015.7298885 http://dx.doi.org/10.1109/CVPR.2015.7298885 ]

Yang Y and Newsam S. 2010. Bag-of-visual-words and spatial extensions for land-use classification//Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. San Jose, USA: ACM: 270-279[ DOI: 10.1145/1869790.1869829 http://dx.doi.org/10.1145/1869790.1869829 ]

Zhang C Y, Bengio S, Hardt M, Recht B and Vinyals O. 2017. Understanding deep learning requires rethinking generalization//Proceedings of the 5th International Conference on Learning Representations. Toulon, France: ICLR: 1-15