针对多源遥感图像分类的门控跨模态聚合网络
Gated cross-modal aggregation network for multi-source remote sensing data classification
- 2024年 页码:1-12
网络出版日期: 2024-09-10
DOI: 10.11834/jig.240359
移动端阅览
浏览全部资源
扫码关注微信
网络出版日期: 2024-09-10 ,
移动端阅览
金学鹏,高峰,石晓晨等.针对多源遥感图像分类的门控跨模态聚合网络[J].中国图象图形学报,
Jin Xuepeng,Gao Feng,Shi Xiaochen,et al.Gated cross-modal aggregation network for multi-source remote sensing data classification[J].Journal of Image and Graphics,
目的
2
近年来,为了突破单一传感器的技术限制并弥补单一数据源应用的局限性,多源遥感数据融合成为了遥感应用领域的研究热点。当前的高光谱图像与LiDAR(Light Detection And Ranging)/SAR(Synthetic Aperture Radar)数据融合分类方法未能充分利用高光谱图像的光谱特征以及LiDAR/SAR数据的地物结构信息。由于不同成像模态的图像在数据特性上存在本质差异,这些差异为多源图像特征的关联带来了重大挑战。尽管采用深度学习技术的一些方法在结合高光谱与LiDAR/SAR数据进行分类的任务中显示出了积极的结果,但它们在融合过程中未能充分利用多源数据中的纹理信息和几何信息。
方法
2
为了应对这一关键问题,本文提出了一种基于门控注意力聚合网络的多源遥感图像分类方法,可以更加全面地挖掘多源数据中的互补信息。首先,设计了一个门控跨模态聚合模块,利用交叉注意力特征融合将LiDAR/SAR数据中的地物精细结构信息与高光谱图像特征有机融合。然后,使用精细化的门控模块将关键的LiDAR/SAR特征集成到高光谱图像特征中,从而增强多源数据的融合效果。
结果
2
实验在Houston2013和Augsburg数据集上与主流的7种方法进行比较,在总体精度(Overall Accuracy,OA)、平均精度(Average Accuracy,AA)和卡帕系数(Kappa Coefficient,Kappa)指标上都取得了最优表现。特别是在Augsburg数据集中,本文方法在大多数类别上均取得了最佳指标。在分类的可视化结果中可以明显看出,本文所提出的分类模型在性能上具有显著优势。
结论
2
在Houston2013和Augsburg数据集上的实验结果表明,本文所提出的GCA-Net具有优异的性能,显著优于HCT(Hierarchical CNN and Transformer)、MACN(Mixing self-Attention and Convolution Network)等当前主流方法。该方法能够根据不同模态的特点充分融合不同模态的信息进行分类,为多源遥感数据的融合分类提供了理论支持。
Objective
2
In recent years, multi-source remote sensing data fusion has emerged as a research hotspot in the field of remote sensing applications. This trend aims to overcome the technical limitations of single sensors and address the constraints associated with relying on a single data source. Traditional remote sensing methods, which often depend on a single type of sensor, face significant challenges in providing comprehensive and accurate information due to the inherent limitations of the sensors. For instance, hyperspectral sensors capture detailed spectral information but may lack spatial resolution, while LiDAR and SAR sensors excel in capturing structural information but may not provide sufficient spectral details. The integration of hyperspectral images and LiDAR/SAR data holds great promise for enhancing remote sensing applications. However, current methods for fusion classification have not fully utilized the rich spectral features of hyperspectral images and the structural information of ground objects provided by LiDAR/SAR data. These two types of data have fundamentally different characteristics, which pose significant challenges for effective feature correlation. Hyperspectral images contain abundant spectral information that can identify different materials, while LiDAR provides 3D structural information, and SAR offers high-resolution imaging under various weather conditions. The differences in data characteristics among these imaging modalities create substantial challenges in correlating multi-source image features. Although some deep learning-based methods have shown promising results in the fusion classification tasks of hyperspectral and LiDAR/SAR data, they often fall short in fully exploiting the texture and geometric information embedded within the multi-source data during the fusion process. These methods may perform well in specific scenarios but often lack the robustness and versatility needed for broader applications. Consequently, there is a pressing need for more sophisticated approaches that can leverage the complementary strengths of different data sources to improve classification accuracy and reliability.
Method
2
To address this critical issue, this paper proposes a novel multi-source remote sensing image classification method based on a gated attention aggregation network (GCA-Net). This approach aims to more comprehensively exploit the complementary information available in multi-source data. The core innovation of the proposed method lies in its ability to effectively integrate the unique advantages of different sensor data types through a series of advanced neural network modules. Firstly, a gated cross-modal aggregation module is designed to facilitate the organic integration of detailed structural information from LiDAR/SAR data with the spectral features of hyperspectral images. This module leverages cross-attention mechanisms to enhance the feature fusion process, ensuring that the distinctive information from each data source is effectively utilized. The cross-attention feature fusion mechanism allows the model to focus on the most relevant features from each modality, enhancing the overall representation of the fused data. Secondly, a refined gating module is employed to integrate valuable LiDAR/SAR features into the hyperspectral image features. This gating mechanism selectively incorporates the most informative features, thereby enhancing the fusion effect of multi-source data. The gating module acts as a filter that prioritizes the integration of features that contribute the most to the classification task, ensuring that the fused data retains the critical information from both sources. This approach not only improves the accuracy of the classification but also enhances the robustness of the model across different datasets and scenarios.
Results
2
The proposed method was rigorously tested through experiments conducted on two widely recognized datasets: Houston2013 and Augsburg. These datasets provide a diverse range of scenes and conditions, making them ideal for evaluating the effectiveness of the proposed method. The performance of the GCA-Net was compared with seven mainstream methods to assess its efficacy. The experimental results demonstrated that the proposed method achieved the best performance in terms of Overall Accuracy (OA), Average Accuracy (AA), and Kappa Coefficient (Kappa). Specifically, the GCA-Net outperformed the other methods by a significant margin, highlighting its superior ability to fuse and classify multi-source remote sensing data. On the Augsburg dataset, in particular, the proposed method achieved the best indicators across most categories. This dataset includes a variety of urban and rural scenes, challenging the classification models with diverse textures and structures. The GCA-Net's ability to handle such diversity showcases its robustness and versatility. Moreover, the classification visualization results clearly show that the proposed classification model has a significant performance advantage. The visualized results provide a clear and intuitive representation of the classification accuracy, with the GCA-Net producing more precise and consistent classifications compared to other methods. This visual evidence underscores the practical benefits of the proposed method, demonstrating its potential for real-world applications in remote sensing.
Conclusion
2
The experimental results on the Houston2013 and Augsburg datasets provide strong evidence that the proposed GCA-Net has excellent performance, significantly surpassing current mainstream methods such as HCT and MACN. The GCA-Net's ability to fully integrate information from different modalities based on their unique characteristics is a key factor in its success. This integration allows the model to leverage the strengths of each data source, resulting in a more accurate and reliable classification. The proposed method's superior performance can be attributed to its innovative use of gated attention mechanisms and cross-modal feature fusion. These techniques enable the model to selectively focus on the most relevant features from each modality, enhancing the overall quality of the fused data. By effectively combining the spectral richness of hyperspectral images with the structural details of LiDAR/SAR data, the GCA-Net sets a new benchmark for multi-source remote sensing data fusion. In conclusion, the GCA-Net provides a robust theoretical foundation for the fusion classification of multi-source remote sensing data. Its ability to address the limitations of single-sensor approaches and leverage the complementary strengths of different data types makes it a valuable tool for advancing remote sensing applications. The proposed method not only improves classification accuracy but also enhances the practical applicability of remote sensing technologies, paving the way for more sophisticated and effective solutions in various domains such as environmental monitoring, urban planning, and disaster management.
高光谱图像激光雷达合成孔径雷达后向散射信息多源特征融合
Hyperspectral imageLight Detection and RangingSynthetic Aperture RadarBack ScatteringMulti-Source Feature Fusion
Arshad, Tahir, Zhang and Junping. 2024. Hierarchical Attention Transformer for Hyperspectral Image Classification. IEEE Geoscience and Remote Sensing Letters,PP(01): 1-1 [DOI: 10.1109/LGRS.2024.3379509http://dx.doi.org/10.1109/LGRS.2024.3379509]
Bioucas-Dias JM, Plaza A, Camps-Valls G, Scheunders P, Nasrabadi N and Chanussot J. 2013. Hyperspectral Remote Sensing Data Analysis and Future Challenges. IEEE Geoscience and Remote Sensing Magazine, 1(2): 6-36 [DOI: 10.1109/MGRS.2013.2244672http://dx.doi.org/10.1109/MGRS.2013.2244672]
Baumgartner A, Gege P, Köhler C, Lenhard K and Schwarzmaier T. 2012. Characterisation methods for the hyperspectral sensor HySpex at DLR’s calibration home base. In: Proceedings of the SPIE, Vol. 8533, Remote Sensing of the Ocean, Sea Ice, Coastal Waters, and Large Water Regions, Edinburgh, United Kingdom, September 23-27, [DOI: 10.1117/12.974664http://dx.doi.org/10.1117/12.974664]
Feng M, Gao F, Fang J and Dong J. 2021. Hyperspectral and LiDAR Data Classification Based on Linear Self-Attention. In: 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, pp. 2401-2404. [DOI: 10.1109/IGARSS47720.2021.9553769http://dx.doi.org/10.1109/IGARSS47720.2021.9553769]
Fang S., Li K., and Li Z. 2022. S²ENet: Spatial–Spectral Cross-Modal Enhancement Network for Classification of Hyperspectral and LiDAR Data. IEEE Geoscience and Remote Sensing Letters, 19, 1-5. [DOI: 10.1109/LGRS.2021.3121028http://dx.doi.org/10.1109/LGRS.2021.3121028]
Gao F, Meng D S, Xie Z Y, Qi L and Dong J Y. 2024. Classification of multi-source remote sensing images based on Transformer and dynamic 3D convolution. Journal of Beijing University of Aeronautics and Astronautics, 50(2): 606-614
高峰, 孟德森, 解正源, 亓林, 董军宇. 2024.基于Transformer和动态3D卷积的多源遥感图像分类. 北京航空航天大学学报,50(2): 606-614 [DOI:10.13700/j.bh.1001-5965.2022.0397http://dx.doi.org/10.13700/j.bh.1001-5965.2022.0397]
Hong D, Gao L, Hang R, Zhang B and Chanussot J. 2022. Deep Encoder–Decoder Networks for Classification of Hyperspectral and LiDAR Data. IEEE Geoscience and Remote Sensing Letters, 19: 1-5. [DOI: 10.1109/LGRS.2020.3017414http://dx.doi.org/10.1109/LGRS.2020.3017414]
Hu S, Gao F, Gong Z R, Tao S E, Shangguan X Y and Dong J Y. 2024. Parallel channel shuffling and Transformer-based denoising for hyperspectral images. Journal of Image and Graphics, 29(7): 2063-2074
胡帅, 高峰, 龚卓然, 陶盛恩, 上官心语, 董军宇. 2024.基于 Transformer 和通道混合并行卷积的高光谱图像去噪. 中国图象图形学报,29(7): 2063-2074 [DOI:10.11834/jig.230381http://dx.doi.org/10.11834/jig.230381]
Li W, Wang J, Gao Y, Zhang M, Tao R and Zhang B. 2022. Graph-Feature-Enhanced Selective Assignment Network for Hyperspectral and Multispectral Data Classification. IEEE Transactions on Geoscience and Remote Sensing, 60: 1-14 [DOI: 10.1109/TGRS.2022.3166252http://dx.doi.org/10.1109/TGRS.2022.3166252]
Li W., Gao Y., Zhang M., Tao R., and Du Q. 2023. Asymmetric Feature Fusion Network for Hyperspectral and SAR Image Classification. IEEE Transactions on Neural Networks and Learning Systems, 34(10), 8057-8070. [DOI: 10.1109/TNNLS.2022.3149394http://dx.doi.org/10.1109/TNNLS.2022.3149394]
Li K, Wang D, Wang X, Liu G, Wu Z and Wang Q. 2023. Mixing Self-Attention and Convolution: A Unified Framework for Multisource Remote Sensing Data Classification. IEEE Transactions on Geoscience and Remote Sensing, 61, 1-16. [DOI: 10.1109/TGRS.2023.3310521http://dx.doi.org/10.1109/TGRS.2023.3310521]
Ma W, Guo Y, Zhu H, Yi X, Zhao W, Wu Y, Hou B and Jiao L. 2024. Intra- and Intersource Interactive Representation Learning Network for Remote Sensing Images Classification. IEEE Transactions on Geoscience and Remote Sensing, 62: 1-15 [DOI: 10.1109/TGRS.2024.3352816http://dx.doi.org/10.1109/TGRS.2024.3352816]
Moreira A, Prats-Iraola P, Younis M, Krieger G, Hajnsek I and Papathanassiou KP. 2013. A tutorial on synthetic aperture radar. IEEE Geoscience and Remote Sensing Magazine, 1(1): 6-43 [DOI: 10.1109/MGRS.2013.2248301http://dx.doi.org/10.1109/MGRS.2013.2248301]
Man Q, Dong P and Guo H. 2015. Pixel- and feature-level fusion of hyperspectral and LiDAR data for urban land-use classification. International Journal of Remote Sensing, 36(6): 1618-1644. [DOI: 10.1080/01431161.2015.1015657http://dx.doi.org/10.1080/01431161.2015.1015657]
Melgani F., and Bruzzone L. (2004). Classification of hyperspectral remote sensing images with support vector machines. IEEE Transactions on Geoscience and Remote Sensing, 42(8), 1778-1790. [DOI: 10.1109/TGRS.2004.831865http://dx.doi.org/10.1109/TGRS.2004.831865]
Roy SK, Sukul A, Jamali A, Haut JM and Ghamisi P. 2024. Cross Hyperspectral and LiDAR Attention Transformer: An Extended Self-Attention for Land Use and Land Cover Classification. IEEE Transactions on Geoscience and Remote Sensing, 62: 1-15 [DOI: 10.1109/TGRS.2024.3374324http://dx.doi.org/10.1109/TGRS.2024.3374324]
Roy SK, Krishna G, Dubey SR, Chaudhuri BB. 2020. HybridSN: Exploring 3-D–2-D CNN Feature Hierarchy for Hyperspectral Image Classification. IEEE Geoscience and Remote Sensing Letters, 17(2), 277-281. [DOI: 10.1109/LGRS.2019.2918719http://dx.doi.org/10.1109/LGRS.2019.2918719]
Wang J, Gao F, Dong J and Du Q. 2021. Adaptive DropBlock-Enhanced Generative Adversarial Networks for Hyperspectral Image Classification. IEEE Transactions on Geoscience and Remote Sensing, 59(6): 5040-5053 [DOI: 10.1109/TGRS.2020.3015843http://dx.doi.org/10.1109/TGRS.2020.3015843]
Xiao Z J, Lin B H and Qu H C. 2024. SAR ship detection with multi-mechanism fusion. Journal of Image and Graphics, 29(2): 545-558
肖振久, 林渤翰, 曲海成. 2024.融合多重机制的 SAR 舰船检测. 中国图象图形学报,29(2): 545-558 [DOI:10.11834/jig.230166http://dx.doi.org/10.11834/jig.230166]
Xu X, Li W, Ran Q, Du Q, Gao L and Zhang B. 2018. Multisource Remote Sensing Data Classification Based on Convolutional Neural Network. IEEE Transactions on Geoscience and Remote Sensing, 56(2), 937-949. [DOI: 10.1109/TGRS.2017.2756851http://dx.doi.org/10.1109/TGRS.2017.2756851]
Yao J, Zhang B, Li C, Hong D and Chanussot J. 2023. Extended Vision Transformer (ExViT) for Land Use and Land Cover Classification: A Multimodal Deep Learning Framework. IEEE Transactions on Geoscience and Remote Sensing, 61, 1-15. [DOI: 10.1109/TGRS.2023.3284671http://dx.doi.org/10.1109/TGRS.2023.3284671]
Zhao X, Tao R, Li W, Li H-C, Du Q, Liao W, and Philips W. 2020. Joint Classification of Hyperspectral and LiDAR Data Using Hierarchical Random Walk and Deep CNN Architecture. IEEE Transactions on Geoscience and Remote Sensing, 58(10), 7355-7370. [DOI: 10.1109/TGRS.2020.2982064http://dx.doi.org/10.1109/TGRS.2020.2982064]
Zhao G, Ye Q, Sun L, Wu Z, Pan C and Jeon B. 2023. Joint Classification of Hyperspectral and LiDAR Data Using a Hierarchical CNN and Transformer. IEEE Transactions on Geoscience and Remote Sensing, 61: 1-16 [DOI: 10.1109/TGRS.2022.3232498http://dx.doi.org/10.1109/TGRS.2022.3232498]
相关作者
相关机构