用于遥感场景分类的全局-局部特征耦合网络
Global-local feature coupling network for remote sensing scene classification
- 2024年 页码:1-13
网络出版日期: 2024-10-16
DOI: 10.11834/jig.240228
移动端阅览
浏览全部资源
扫码关注微信
网络出版日期: 2024-10-16 ,
移动端阅览
王俊杰,李伟,张蒙蒙等.用于遥感场景分类的全局-局部特征耦合网络[J].中国图象图形学报,
Wang Junjie,Li Wei,Zhang Mengmeng,et al.Global-local feature coupling network for remote sensing scene classification[J].Journal of Image and Graphics,
目的
2
卷积神经网络(CNN)因其强大的特征归纳和学习能力,在遥感场景分类任务中收获了广泛的关注。然而,由于卷积采取的是一种局部归纳机制,这阻碍了全局依赖关系的获取,并限制了模型的性能。而视觉transformer(ViT)的核心在于自注意力机制,它能够建立全局依赖关系,这一属性可以缓解基于卷积神经网络算法的局限性。然而,自注意力机制也带来了更大的计算代价:在计算成对的key-value之间的交互关系时,它需要在所有空间位置上进行相关计算,从而带来巨大的计算压力和内存负担。此外,自注意机制关注于建模全局信息,而忽略了局部特征细节。为了解决上述问题,本文提出了一种全局-局部特征耦合网络用于遥感场景分类。
方法
2
本文方法分为两个方面。一方面为了缓解自注意力机制所带来的计算压力,本文提出了一种双粒度注意力来动态感知数据内容,从而实现更灵活的计算分配。另一方面,为了更好地结合全局和局部特征,本文利用了一种自适应耦合模块来实现全局和局部特征的融合。
结果
2
本文在UCM、AID和NWPU-RESISC45三个数据集上进行了实验。为了更好地展示本文所提出方法的优越性,与当前先进的基于卷积神经网络和基于视觉transformer的方法进行了对比,在不同的训练比率下,本文所提出方法在三个数据集上分别取得了99.71%(UCM数据集),94.75%(AID数据集训练比率20%),97.05%(AID数据集训练比率50%),92.11%(NWPU-RESISC45数据集训练比率10%)以及94.10%(NWPU-RESISC45数据集训练比率20%)的最优分类表现,相较于其他对比方法分别拥有了至少0.14%,0.06%,0.27%,0.43%以及0.21%的效果提升。
结论
2
本文所提出的方法不仅缓解了自注意力机制中沉重的计算和内存负担,同时将局部细节特征与全局信息相结合,有效提升了模型的特征学习能力。
Objective
2
Convolutional neural network (CNN) has received significant attention in remote sensing scene classification tasks due to its powerful feature induction and learning capabilities. However, since CNN adopts a local induction mechanism, this hinders the acquisition of global dependencies and limits the performance of the model. At the same time, visual transformer (ViT) have gained considerable popularity in various visual tasks, including widespread attention in the field of remote sensing image processing. The core of ViT lies in the self-attention mechanism, which enables the establishment of global dependencies. This property alleviates the limitations of CNN-based algorithms. However, the self-attention mechanism also introduces higher computational costs. When calculating interactions between key-value pairs, it requires computations across all spatial locations, leading to a huge computational burden and heavy memory footprint. Furthermore, the self-attention mechanism focuses on modeling global information while ignoring local detailed feature. To solve the above problems, this paper proposes a global-local feature coupling network for remote sensing scene classification.
Method
2
The overall network architecture of the proposed global-local feature coupling network consists of multiple convolutional layers and dual-channel coupling modules. These dual-channel coupling module includes a ViT branch based on dual-grained attention and a branch with depth-wise separable convolution. Subsequently, feature fusion is achieved using the proposed adaptive coupling module, facilitating an effective combination of global and local f
eatures, thereby enhancing the model's capability to understand remote sensing scene images. On one hand, to alleviate the huge computational burden caused by self-attention mechanisms, a dual-grained attention is proposed to dynamically perceive data content, thereby achieving more flexible computation allocation. The goal of the dual-grained attention is to enable each query to focus on a small subset of key-value pairs that are semantically most relevant. To efficiently achieve global attention by identifying the most relevant key-value pairs, less relevant key-value pairs are initially filtered out at a coarse-grained region level. This is accomplished by constructing a regional correlation graph and pruning it to retain only the top-
k
regions with the highest correlation. As a result, each region only needs to focus on its top-
k
most relevant regions. Once the attention regions are determined, the next step involves collecting fine-grained key/value tokens to achieve token-to-token attention, thereby realizing a dynamic and query-aware sparse attention. In conclusion, for a query, key-value pairs that are not relevant are initially filtered out based on a coarse-grained region level, and then fine-grained token-to-token attention is employed within the set of retained candidate regions. On the other hand, to achieve a better integration of global and local features, an adaptive coupling module is utilized to combine the branches of CNN and ViT. It consists of two coupling operations: spatial coupling and channel coupling, which take the outputs of the two branches of ViT and depth-wise separable convolution as input, and adaptively reweight the features from global and local feature dimensions. At this point, the global and local information contained in the scene image can be aggregated within the same module, achieving a more comprehensive fusion.
Result
2
Experiments are conducted on the UC merced land use dataset (UCM), aerial image dataset (AID), and North Western Polytechnical University-remote sensing image scene classification dataset (NWPU-RESISC4). To demonstrate the superiority of the proposed method, comparisons are made with state-of-the-art CNN-based and ViT-based methods. At different training ratios, the method proposed in this paper achieved the best classification results with an accuracy of 99.71% ± 0.20%, 94.75% ± 0.09%, 97.05% ± 0.12%, 92.11% ± 0.20%, and 94.10% ± 0.17%. In addition, to more intuitively demonstrate the positive effect of the proposed two modules on the experimental results, a series of ablation experiments were performed on three data sets. Experimental results demonstrate that the proposed dual-grained attention and adaptive coupling module alleviate model calculation pressure and improve model classification performance.
Conclusion
2
In this paper, a novel global-local feature coupling network is proposed for remote sensing scene classification task. First, to address the computational cost issue associated with the conventional attention mechanism in ViT, a new dynamic dual-grained attention is proposed, which utilizes sparsity to save computation while involving only GPU-friendly dense matrix multiplication. Furthermore, to better integrate global and local detailed features, an adaptive coupling module is designed to facilitate the mixing and interaction of information from two branches, significantly enhancing the representational capabilities of the extracted features. Extensive experimental results on the three datasets have demonstrated the effectiveness of the global-local feature coupling network.
场景分类遥感图像全局和局部特征耦合模块注意力机制
scene classificationremote sensing imagesglobal and local featurescoupling moduleattention mechanism
Bazi Y, Bashmal L, Rahhal M M A, Dayil R A and Ajlan N A. 2021. Vision transformers for remote sensing image classification. Remote Sensing,13(3):516, [ DOI: 10.3390/rs13030516http://dx.doi.org/10.3390/rs13030516]
Bian X Y, Chen C, Tian L and Du Q. 2017. Fusing local and global features for high-resolution scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,10(6):2889–2901, [ DOI: 10.1109/JSTARS.2017.2683799http://dx.doi.org/10.1109/JSTARS.2017.2683799]
Cao J L, Li Y L, Sun H Q, Xie J, Huang K Q and Pang Y W. 2022. A survey on deep learning based visual object detection. Journal of Image and Graphics, 27(6): 1697-1722
曹家乐,李亚利,孙汉卿,谢今,黄凯奇,庞彦伟.202. 基于深度学习的视觉目标检测技术综述.图像图形学报, 27(6): 1697-1722 [DOI: 10.11834/jig.220069http://dx.doi.org/10.11834/jig.220069]
Chalib S, Liu H, Gu Y F and Yao H X. 2017. Deep feature fusion for vhr remote sensing scene classification. IEEE Transactions on Geoscience and Remote Sensing,55(8):4775–4784, [ DOI: 10.1109/TGRS.2017.2700322http://dx.doi.org/10.1109/TGRS.2017.2700322]
Cheng G, Han J W, Lu X Q. 2017. Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE,105(10):1865-1883, [ DOI: 10.1109/JPROC.2017.2675998http://dx.doi.org/10.1109/JPROC.2017.2675998]
Cheng G, Sun X X, Li K, Guo L and Han J W. 2022. Perturbation-seeking generative adversarial networks: A defense framework for remote sensing image scene classification. IEEE Transactions on Geoscience and Remote Sensing,60:1-11, [ DOI: 10.1109/TGRS.2021.3081421http://dx.doi.org/10.1109/TGRS.2021.3081421]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Uhterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Jakob U and Neil H. 2020. An image is worth 16x16 words: Transformers for image recognition at scale [EB-OL]. [2021-06-03]. https://arxiv.org/pdf/2010.11929.pdfhttps://arxiv.org/pdf/2010.11929.pdf
Fan G H, Ma Y, Mei X G, Huang J, Fan F, Li H. 2022. Spatial-coordinated autoencoder for hyperspectral anomaly detection. Journal of Image and Graphics, 27(10): 3116-3126
樊港辉,马泳,梅晓光,黄珺,樊凡,李皞.2022. 空域协同自编码器的高光谱异常检测.图像图形学报, 27(10): 3116-3126 [DOI: 10.11834/jig.210246http://dx.doi.org/10.11834/jig.210246]
Gao Y H, Zhang M M, Li W, Song X K, Jiang X Y and Ma Y Q. 2023. Adversarial complementary learning for multisource remote sensing classification. IEEE Transactions on Geoscience and Remote Sensing,61:1-13 [ DOI: 10.1109/TGRS.2023.3255880]
Gao Y H, Zhang M M, Wang J J and Li W. 2023. Cross-scale mixing attention for multisource remote sensing data fusion and classification. IEEE Transactions on Geoscience and Remote Sensing,61:1-15 [ DOI: 10.1109/TGRS.2023.3263362]
Guo N B, Jiang M Y, Gao L J, Li K T, Zheng F J, Chen X N and Wang M D. 2023. Hfcc-net: A dual-branch hybrid framework of cnn and capsnet for land-use scene classification. Remote Sensing,15(20):5044, [ DOI: 10.3390/rs15205044http://dx.doi.org/10.3390/rs15205044]
He N J, Fang L Y, Li S T, Plaza J and Plaza.A. 2020. Skip-connected covariance network for remote sensing scene classification. IEEE Transactions on Neural Networks and Learning Systems,13(5):1461–1474 [ DOI: 10.1109/TNNLS.2019.2920374http://dx.doi.org/10.1109/TNNLS.2019.2920374]
Hinton G E and Salakhutdinov R R. 2006. Reducing the dimensionality of data with neural networks. Science,313(5786):504–507 [ DOI: 10.1126/science.1127647]
Huang X, Liu H and Zhang L P. 2015. Spatiotemporal detection and analysis of urban villages in mega city regions of china using high-resolution remotely sensed imagery. IEEE Transactions on Geoscience and Remote Sensing,53(7):3639-3657 [ DOI: 10.1109/TGRS.2014.2380779]
Huang H, Xu K J and Shi G Y. 2020. Scene Classification of High-Resolution Remote Sensing Image by Multi-scale and Multi-feature Fusion. Acta Electonica Sinica,48(9): 1824-1833
黄鸿,徐科杰,石光耀. 2020. 联合多尺度多特征的高分遥感图像场景分类. 电子学报,48(9): 1824-1833 [DOI: 10.3969/j.issn.0372-2112.2020.09.021http://dx.doi.org/10.3969/j.issn.0372-2112.2020.09.021]
Huang H, Xu K J and Shi G Y. 2020. Scene Classification of High-Resolution Remote Sensing Image by Multi-scale and Multi-feature Fusion. Acta Electonica Sinica,48(9): 1824-1833
黄鸿,徐科杰,石光耀. 2020. 联合多尺度多特征的高分遥感图像场景分类. 电子学报,48(9): 1824-1833 [DOI: 10.3969/j.issn.0372-2112.2020.09.021http://dx.doi.org/10.3969/j.issn.0372-2112.2020.09.021]
Leng J X, Mo M J C, Zhou Y H, Ye Y M, Gao C Q, Gao X B. 2023. Recent advances in drone-view object detection. Journal of Image and Graphics, 28(09):2563-2586
冷佳旭,莫梦竟成,周应华,叶永明,高陈强,高新波.2023. 无人机视角下的目标检测研究进展.图像图形学报, 28(09):2563-2586 [DOI: 10.11834/jig.220836http://dx.doi.org/10.11834/jig.220836]
Li M Y, Fu Y. 2023. Joint self-attention transformer for multispectral and hyperspectral image fusion. Journal of Image and Graphics, 28(12):3922-3934
李妙宇,付莹.2023. 用于多光谱和高光谱图像融合的联合自注意力Transformer.图像图形学报, 28(12):3922-3934 [DOI: 10.11834/jig.220954http://dx.doi.org/10.11834/jig.220954]
Liu K, Zhou Z, Li S Y, Liu Y F, Wan X, Liu Z W, Tan H and Zhang W F. 2020. Scene classification dataset using the Tiangong-1 hyperspectral remote sensing imagery and its applications. Journal of Remote Sensing,24(9): 1077-1087
刘康,周壮,李盛阳,刘云飞,万雪,刘志文,谭洪,张万峰.2020.天宫一号高光谱遥感场景分类数据集及应用.遥感学报,24(9): 1077-1087 [DOI: 10.11834/jrs.20209323http://dx.doi.org/10.11834/jrs.20209323]
Lv P Y, Wu W J, Zhong Y F, Du F and Zhang L P. 2022. Scvit: A spatial-channel feature preserving vision transformer for remote sensing image scene classification. IEEE Transactions on Geoscience and Remote Sensing,60:1-12, [ DOI: 10.1109/TGRS.2022.3157671http://dx.doi.org/10.1109/TGRS.2022.3157671]
Ma J J, Li M T, Tang X, Zhang X R, Liu F and Jiao L C. 2022. Homo–heterogenous transformer learning framework for rs scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,15:2223–2239, [ DOI: 10.1109/JSTARS.2022.3155665http://dx.doi.org/10.1109/JSTARS.2022.3155665]
Nie J T, Zhang L, Wei W, Yan Q S, Ding C, Chen G C, Zhang Y L. 2023. A survey of hyperspectral image super-resolution method. Journal of Image and Graphics, 28(06):1685-1697
聂江涛,张磊,魏巍,闫庆森,丁晨,陈国超,张艳宁.2023. 高光谱图像超分辨率重建技术研究进展.图像图形学报, 28(06):1685-1697 [DOI: 10.11834/jig.230038http://dx.doi.org/10.11834/jig.230038]
Ojala T, Pietikainen M, Maenpaa T. 2002. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence,24(7):971–987 [ DOI: 10.1109/TPAMI.2002.1017623]
Oliva A and Torralba A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision,42:145–175 [ DOI: 10.1023/A:1011139631724]
Olshausen B A and Field D J. 1997. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research,37(23):3311–3325 [ DOI: 10.1016/S0042-6989(97)00169-7]
Qi K L, Guan Q F, Yang C, Peng F F, Shen S Y and Wu H Y. 2018. Concentric circle pooling in deep convolutional networks for remote sensing scene classification. Remote Sensing,10(6):934, [ DOI: 10.3390/rs10060934http://dx.doi.org/10.3390/rs10060934]
Sitaula C, KC S and Aryal J. 2024. Enhanced multi-level features for very high resolution remote sensing scene classification. Neural Computing and Applications, 1-13, [ DOI: 10.1007/s00521-024-09446-y]
Sun H, Li S Y, Zheng X T and Lu X Q. 2020. Remote sensing scene classification by gated bidirectional network. IEEE Transactions on Geoscience and Remote Sensing,58(1):82–96, [ DOI: 10.1109/TGRS.2019.2931801http://dx.doi.org/10.1109/TGRS.2019.2931801]
Tang X, Li M T, Ma J J, Zhang X R, Liu F and Jiao L C. 2022. Emtcal: Efficient multiscale transformer and cross-level attention learning for remote sensing scene classification. IEEE Transactions on Geoscience and Remote Sensing,60:1-15, [ DOI: 10.1109/TGRS.2022.3194505http://dx.doi.org/10.1109/TGRS.2022.3194505]
Tian T, Li L L, Chen W T and Zhou H B. 2021. Semsdnet: A multiscale dense network with attention for remote sensing scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,14:5501–5514, [ DOI: 10.1109/JSTARS.2021.3074508http://dx.doi.org/10.1109/JSTARS.2021.3074508]
Wang J J, Gao F, Dong J Y, Zhang S and Du Q. 2022. Change detection from synthetic aperture radar images via graph-based knowledge supplement network. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,15:1823–1836 [ DOI: 10.1109/JSTARS.2022.3146167]
Wang J J, Gao F, Dong J Y and Du Q. 2021. Adaptive dropblock-enhanced generative adversarial networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing,59(6):5040–5053 [ DOI: 10.1109/TGRS.2020.3015843]
Wang J J, Li W, Wang Y J, Tao R and Du Q. 2023. Representation-enhanced status replay network for multisource remote-sensing image classification. EEE Transactions on Neural Networks and Learning Systems,1-13 [ DOI: 10.1109/TNNLS.2023.3286422]
Wang J J, Zhang M M, Li W and Tao R.2023. A multistage information complementary fusion network based on flexible-mixup for hsi-x image classification. IEEE Transactions on Neural Networks and Learning Systems,1-13 [ DOI: 10.1109/TNNLS.2023.3300903]
Wang J N, Gao Y, Shi J and Liu Z Q. 2021. Scene Classification of Optical High-resolution Remote Sensing Images Using Vision Transformer and Graph Convolutional Network. Acta Photonica Sinica,50(11): 1128002
王嘉楠,高越,史骏,刘子琦. 2021. 基于视觉转换器和图卷积网络的光学遥感场景分类. 光子学报,50(11): 1128002 [DOI: 10.3788/gzxb20215011.1128002http://dx.doi.org/10.3788/gzxb20215011.1128002]
Wang S D, Guan Y and Shao L. 2020. Multi-granularity canonical appearance pooling for remote sensing scene classification. IEEE Transactions on Image Processing,29:5396–5407, [ DOI: 10.1109/TIP.2020.2983560http://dx.doi.org/10.1109/TIP.2020.2983560]
Wang S Q, Yang J X, Shao Y T and Xiao L. 2023. Non-negative sparse component decomposition based modeling and robust unmixing for hyperspectral images. Journal of Image and Graphics, 28(2): 613-627
汪顺清,杨劲翔,邵远天,肖亮.2023. 高光谱图像非负稀疏分量分解建模与鲁棒性解混方.图像图形学报, 28(2): 613-627 [DOI: 10.11834/jig.211054http://dx.doi.org/10.11834/jig.211054]
Wang W Q, Chen Y S and Ghamisi P. 2022. Transferring cnn with adaptive learning for remote sensing scene classification. IEEE Transactions on Geoscience and Remote Sensing,60:1-18, [ DOI: 10.1109/TGRS.2022.3190934http://dx.doi.org/10.1109/TGRS.2022.3190934]
Wang X Y, Yuan L M, Xu H X and Wen X B. 2021. Csds: End-to-end aerial scenes classification with depthwise separable convolution and an attention mechanism. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,14:10484–10499, [ DOI: 10.1109/JSTARS.2021.3117857http://dx.doi.org/10.1109/JSTARS.2021.3117857]
Wang Y L, Zhang T, Zhao L J, Hu L, Wang Z C, Niu Z Q, Cheng P R, Chen K Q, Zeng X and Wang Z R. 2024. Ringmo-lite: A remote sensing lightweight network with cnn-transformer hybrid framework. IEEE Transactions on Geoscience and Remote Sensing, [ DOI: 10.1109/TGRS.2024.3360447http://dx.doi.org/10.1109/TGRS.2024.3360447]
Xia G S, Hu J W, Hu F, Shi B G, Bai X, Zhong Y F, Zhang L P and Lu X Q. 2017. Aid: A benchmark data set for performance evaluation of aerial scene classification. IEEE Transactions on Geoscience and Remote Sensing,55(7):3965–3981, [ DOI: 10.1109/TGRS.2017.2685945http://dx.doi.org/10.1109/TGRS.2017.2685945]
Xu K J, Huang H, Li Y and Shi G Y. 2023. Multilayer feature fusion network for scene classification in remote sensing. IEEE Geoscience and Remote Sensing Letters,17(11):1894–1898 [ DOI: 10.1109/LGRS.2019.2960026]
Xu K J, Huang H, Deng P F and Shi G Y. 2020. Two-stream feature aggregation deep neural network for scene classification of remote sensing images. Information Sciences,539:250–268, [ DOI: 10.1016/j.ins.2020.06.011http://dx.doi.org/10.1016/j.ins.2020.06.011]
Xu K J, Huang H, Deng P F and Li Y. 2022. Deep feature aggregation framework driven by graph convolutional network for scene classification in remote sensing. IEEE Transactions on Neural Networks and Learning Systems,33(10):5751–5765, [ DOI: 10.1109/TNNLS.2021.3071369http://dx.doi.org/10.1109/TNNLS.2021.3071369]
Yang Y and Newsam S. 2010. Bag-of-visual-words and spatial extensions for land-use classification // Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. San Jose, California, USA: 270–279
Yin J H, Li H and Jia X P. 2015. Crater detection based on gist features. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,8(1):23-29 [ DOI: 10.1109/JSTARS.2014.2375066]
Yu Y L and Liu F X. 2018. A two-stream deep fusion framework for highresolution aerial scene classification. IEEE Transactions on Geoscience and Remote Sensing,2018, [ DOI: 10.1155/2018/8639367http://dx.doi.org/10.1155/2018/8639367]
Yuan L, Chen Y P, Wang T, Yu W H, Shi Y J, Jiang Z H, Tay F E, Feng J S and Yan S C. 2021. Tokens-to-token vit: Training vision transformers from scratch on imagenet // Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, Canada: 558—567
Yuan X, Cheng G, Li G, Dai W, Yin W X, Feng Y C, Yao X W, Huang Z L, Sun X, Han J W. 2023. Progress in small object detection for remote sensing images. Journal of Image and Graphics, 28(06):1662-1684
袁翔,程塨,李戈,戴威,尹文昕,冯瑛超,姚西文,黄钟泠,孙显,韩军伟.2023. 遥感影像小目标检测研究进展.图像图形学报, 28(06):1662-1684 [DOI: 10.11834/jig.221202http://dx.doi.org/10.11834/jig.221202]
Zhang B, Zhang Y J, Wang S G. 2019. A lightweight and discriminative model for remote sensing scene classification with multidilation pooling module. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,12(8):2636-2653 [ DOI: 10.1109/JSTARS.2019.2919317]
Zhang K Q, Cui T F, Wu w, Zheng X K and Cheng G. 2024. Large Kernel separable mixed convnet for remote sensing scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, [ DOI: 10.1109/JSTARS.2024.3353796http://dx.doi.org/10.1109/JSTARS.2024.3353796]
Zhao L J Tang P and Huo L Z. 2015. A 2-d wavelet decomposition-based bagof-visual-words model for land-use scene classification. International Journal of Remote Sensing,35(6):2296–2310 [ DOI: 10.1080/01431161.2014.890762]
Zheng F J, Lin S, Zhou W and Huang H. 2023. A lightweight dual-branch swin transformer for remote sensing scene classification. Remote Sensing,15(11):2865, [ DOI: 10.3390/rs15112865http://dx.doi.org/10.3390/rs15112865]
Sun X, Wang P J, Lu W X, Zhu Z C, Lu X N, He Q B, Li J X, Rong X E, Yang Z J, Chang H, He Q L, Yang G, Wang R P, Lu J W, and Fu K. 2023. Ringmo: a remote sensing foundation model with masked image modeling. IEEE Transactions on Geoscience and Remote Sensing,61:1-22, [ DOI: 10.1109/TGRS.2022.3194732http://dx.doi.org/10.1109/TGRS.2022.3194732]
Huo Y, Gang S, and Guan C. 2023. Fcihmrt: feature cross-layer interaction hybrid method based on res2net and transformer for remote sensing scene classificatio. Electronics,12(20):4362, [ DOI: 10.3390/electronics12204362http://dx.doi.org/10.3390/electronics12204362]
Wang W H, Xie E Z, Li X, Fan D P, Song K T, Liang D, Lu, T, Luo P, and Shao L. 2022. Pvt v2: improved baselines with pyramid vision transformer. Computational Visual Media,8(3):415-424, [ DOI: 10.1007/s41095-022-0274-8http://dx.doi.org/10.1007/s41095-022-0274-8]
Bi M Q, Wang M H, Li Z, and Hong D F. 2023. Vision transformer with contrastive learning for remote sensing image scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,16:738-749, [ DOI: 10.1109/JSTARS.2022.3230835http://dx.doi.org/10.1109/JSTARS.2022.3230835]
Guo J X, Jia N, and Bai J N. 2022. Transformer based on channel-spatial attention for accurate classification of scenes in remote sensing image. Scientific Reports,12(1):15473, [ DOI: 10.1038/s41598-022-19831-zhttp://dx.doi.org/10.1038/s41598-022-19831-z]
相关作者
相关机构