针对多源遥感图像分类的门控跨模态聚合网络
Gated cross-modal aggregation network for multi-source remote sensing data classification
- 2025年30卷第3期 页码:883-894
收稿日期:2024-06-27,
修回日期:2024-09-04,
纸质出版日期:2025-03-16
DOI: 10.11834/jig.240359
移动端阅览
浏览全部资源
扫码关注微信
收稿日期:2024-06-27,
修回日期:2024-09-04,
纸质出版日期:2025-03-16
移动端阅览
目的
2
为了突破单一传感器的技术限制并弥补单一数据源应用的局限性,多源遥感数据融合成为了遥感应用领域的研究热点。当前的高光谱图像与激光雷达(light detection and ranging, LiDAR)/合成孔径雷达(synthetic aperture radar, SAR)数据融合分类方法未能充分利用高光谱图像的光谱特征以及LiDAR/SAR数据的地物结构信息。由于不同成像模态的图像在数据特性上存在本质差异,这些差异为多源图像特征的关联带来了重大挑战。尽管采用深度学习技术的一些方法在结合高光谱与LiDAR/SAR数据进行分类的任务中显示出了优秀的结果,但它们在融合过程中未能充分利用多源数据中的纹理信息和几何信息。
方法
2
为了应对这一关键问题,提出了一种基于门控注意力聚合网络的多源遥感图像分类方法,可以更加全面地挖掘多源数据中的互补信息。首先,设计了一个门控跨模态聚合模块,利用交叉注意力特征融合将LiDAR/SAR数据中的地物精细结构信息与高光谱图像特征有机融合。然后,使用精细化的门控模块将关键的LiDAR/SAR特征集成到高光谱图像特征中,从而增强多源数据的融合效果。
结果
2
在Houston2013和Augsburg数据集上与7种主流方法进行实验比较,在总体精度(overall accuracy,OA)、平均精度(average accuracy,AA)和卡帕系数(Kappa coefficient,Kappa)指标上都取得了最优表现。特别是在Augsburg数据集中,本文方法在大多数类别上均取得了最佳指标。在分类的可视化结果中可以明显看出,本文所提出的分类模型在性能上具有显著优势。
结论
2
实验结果表明,本文所提出的GCA-Net(gated cross-modal aggregation network)具有优异的性能,显著优于HCT(hierarchical CNN and Transformer)、MACN(mixing self-attention and convolutional network)等主流方法。该方法能够根据不同模态的特点充分融合不同模态的信息进行分类,为多源遥感数据的融合分类提供了理论支持。
Objective
2
In recent years, multisource remote sensing data fusion has emerged as a research hotspot in the field of remote sensing applications. This trend aims to overcome the technical limitations of single sensors and address the constraints associated with relying on a single data source. Traditional remote sensing methods, which often depend on a single type of sensor, face considerable challenges in providing comprehensive and accurate information due to the inherent limitations of the sensors. For instance, hyperspectral sensors capture detailed spectral information but may lack spatial resolution, while LiDAR (light detection and ranging) and SAR (synthetic aperture radar) sensors excel in capturing structural information but fail to provide sufficient spectral details. The integration of hyperspectral images and LiDAR/SAR data holds remarkable promise for enhancing remote sensing applications. However, current methods for fusion classification have not fully utilized the rich spectral features of hyperspectral images and the structural information of ground objects provided by LiDAR/SAR data. The two types of data have fundamentally different characteristics, which pose considerable challenges for effective feature correlation. Hyperspectral images contain abundant spectral information that can identify different materials, while LiDAR provides 3D structural information, and SAR offers high-resolution imaging under various weather conditions. The differences in data characteristics among these imaging modalities create substantial challenges in correlating multisource image features. Although some deep learning-based methods have shown promising results in the fusion classification tasks of hyperspectral and LiDAR/SAR data, they often fall short in fully exploiting the texture and geometric information embedded within the multisource data during the fusion process. These methods may perform well in specific scenarios but often lack the necessary robustness and versatility for broader applications. Consequently, highly sophisticated approaches that can leverage the complementary strengths of different data sources are urgently needed to improve classification accuracy and reliability.
Method
2
This paper proposes a novel multisource remote sensing image classification method based on a gated cross-modal aggregation network (GCA-Net) to address this critical issue. This approach aims to comprehensively exploit the complementary information available in multisource data. The core innovation of the proposed method lies in its capability to effectively integrate the unique advantages of different sensor data types through a series of advanced neural network modules. First, a gated cross-modal aggregation module is designed to facilitate the organic integration of detailed structural information from LiDAR/SAR data with the spectral features of hyperspectral images. This module leverages cross-attention mechanisms to enhance the feature fusion process, ensuring that the distinctive information from each data source is effectively utilized. The cross-attention feature fusion mechanism allows the model to focus on the most relevant features from each modality, enhancing the overall representation of the fused data. Second, a refined gating module is employed to integrate valuable LiDAR/SAR features into the hyperspectral image features. This gating mechanism selectively incorporates the most informative features, thereby enhancing the fusion effect of multisource data. The gating module acts as a filter that prioritizes the integration of features that provide the most contribution to the classification task, ensuring that the fused data retains the critical information from both sources. This approach not only improves the accuracy of the classification but also enhances the robustness of the model across different datasets and scenarios.
Result
2
The proposed method was rigorously tested through experiments conducted on two widely recognized datasets: Houston2013 and Augsburg. These datasets provide a diverse range of scenes and conditions, making them ideal for evaluating the effectiveness of the proposed method. The performance of the GCA-Net was compared with seven mainstream methods to assess its efficacy. Experimental results demonstrated that the proposed method achieved the best performance in terms of overall accuracy (OA), average accuracy (AA), and Kappa coefficient (Kappa). Specifically, the GCA-Net outperformed the other methods by a substantial margin, highlighting its superior capability to fuse and classify multisource remote sensing data. Particularly, on the Augsburg dataset, the proposed method achieved the best indicators across most categories. This dataset includes a variety of urban and rural scenes, challenging the classification models with diverse textures and structures. The capability of the GCA-Net to handle such diversity showcases its robustness and versatility. Moreover, the classification visualization results show that the proposed classification model has a notable performance advantage. The visualized results provide a clear and intuitive representation of the classification accuracy, with the GCA-Net producing more precise and consistent classifications compared to other methods. This visual evidence emphasizes the practical benefits of the proposed method, demonstrating its potential for real-world applications in remote sensing.
Conclusion
2
The experimental results on the Houston2013 and Augsburg datasets strongly support the excellent performance of the proposed GCA-Net, drastically surpassing current mainstream methods such as HCT(hierarchical CNN and Transformer) and MACN(mixing self-attention and convolutional network). A key factor in the success of the GCA-Net lies in its capability to fully integrate information from different modalities based on their unique characteristics. This integration allows the model to leverage the strengths of each data source, resulting in a highly accurate and reliable classification. The superior performance of the proposed method can be attributed to its innovative use of gated attention mechanisms and cross-modal feature fusion. These techniques enable the model to selectively focus on the most relevant features from each modality, enhancing the overall quality of the fused data. The GCA-Net sets a new benchmark for multisource remote sensing data fusion by effectively combining the spectral richness of hyperspectral images with the structural details of LiDAR/SAR data. Overall, the GCA-Net provides a robust theoretical foundation for the fusion classification of multisource remote sensing data. The capability of this network to address the limitations of single-sensor approaches and leverage the complementary strengths of different data types increases its value as a tool for advancing remote sensing applications. The proposed method not only improves classification accuracy but also enhances the practical applicability of remote sensing technologies, paving the way for highly sophisticated and effective solutions in various domains, such as environmental monitoring, urban planning, and disaster management.
Arshad T and Zhang J P . 2024 . Hierarchical attention Transformer for hyperspectral image classification . IEEE Geoscience and Remote Sensing Letters , 21 : # 5504605 [ DOI: 10.1109/LGRS.2024.3379509 http://dx.doi.org/10.1109/LGRS.2024.3379509 ]
Baumgartner A , Gege P , Köhler C , Lenhard K and Schwarzmaier T . 2012 . Characterisation methods for the hyperspectral sensor HySpex at DLR’s calibration home base // Proceedings Volume 8533, Sensors, Systems, and Next-Generation Satellites XVI . Edinburgh, United Kingdom : SPIE: 371 - 378 [ DOI: 10.1117/12.974664 http://dx.doi.org/10.1117/12.974664 ]
Bioucas-Dias J M , Plaza A , Camps-Valls G , Scheunders P , Nasrabadi N and Chanussot J . 2013 . Hyperspectral remote sensing data analysis and future challenges . IEEE Geoscience and Remote Sensing Magazine , 1 ( 2 ): 6 - 36 [ DOI: 10.1109/MGRS.2013.2244672 http://dx.doi.org/10.1109/MGRS.2013.2244672 ]
Chen Y S , Li C Y , Ghamisi P , Jia X P and Gu Y F . 2017 . Deep Fusion of Remote Sensing Data for Accurate Classification . IEEE Geoscience and Remote Sensing Letters , 14 ( 8 ): 1253 - 1257 [ DOI: 10.1109/LGRS.2017.2704625 http://dx.doi.org/10.1109/LGRS.2017.2704625 ]
Fang S , Li K Y and Li Z . 2022 . S²ENet: spatial-spectral cross-modal enhancement network for classification of hyperspectral and LiDAR data . IEEE Geoscience and Remote Sensing Letters , 19 : # 6504205 [ DOI: 10.1109/LGRS.2021.3121028 http://dx.doi.org/10.1109/LGRS.2021.3121028 ]
Gao F , Meng D S , Xie Z Y , Qi L and Dong J Y . 2024 . Multi-source remote sensing image classification based on Transformer and dynamic 3D-convolution . Journal of Beijing University of Aeronautics and Astronautics , 50 ( 2 ): 606 - 614
高峰 , 孟德森 , 解正源 , 亓林 , 董军宇 . 2024 . 基于Transformer和动态3D卷积的多源遥感图像分类 . 北京航空航天大学学报 , 50 ( 2 ): 606 - 614 [ DOI: 10.13700/j.bh.1001-5965.2022.0397 http://dx.doi.org/10.13700/j.bh.1001-5965.2022.0397 ]
Hong D F , Gao L R , Hang R L , Zhang B and Chanussot J . 2022 . Deep encoder-decoder networks for classification of hyperspectral and LiDAR data . IEEE Geoscience and Remote Sensing Letters , 19 : # 5500205 [ DOI: 10.1109/LGRS.2020.3017414 http://dx.doi.org/10.1109/LGRS.2020.3017414 ]
Hu S , Gao F , Gong Z R , Tao S E , Shangguan X Y and Dong J Y . 2024 . Parallel channel shuffling and Transformer-based denoising for hyperspectral images . Journal of Image and Graphics , 29 ( 7 ): 2063 - 2074
胡帅 , 高峰 , 龚卓然 , 陶盛恩 , 上官心语 , 董军宇 . 2024 . 基于Transformer和通道混合并行卷积的高光谱图像去噪 . 中国图象图形学报 , 29 ( 7 ): 2063 - 2074 [ DOI: 10.11834/jig.230381 http://dx.doi.org/10.11834/jig.230381 ]
Li K , Wang D , Wang X , Liu G , Wu Z L and Wang Q . 2023a . Mixing self-attention and convolution: a unified framework for multisource remote sensing data classification . IEEE Transactions on Geoscience and Remote Sensing , 61 : # 5523216 [ DOI: 10.1109/TGRS.2023.3310521 http://dx.doi.org/10.1109/TGRS.2023.3310521 ]
Li W , Gao Y H , Zhang M M , Tao R and Du Q . 2023b . Asymmetric feature fusion network for hyperspectral and SAR image classification . IEEE Transactions on Neural Networks and Learning Systems , 34 ( 10 ): 8057 - 8070 [ DOI: 10.1109/TNNLS.2022.3149394 http://dx.doi.org/10.1109/TNNLS.2022.3149394 ]
Li W , Wang J J , Gao Y H , Zhang M M , Tao R and Zhang B . 2022 . Graph-feature-enhanced selective assignment network for hyperspectral and multispectral data classification . IEEE Transactions on Geoscience and Remote Sensing , 60 : # 5526914 [ DOI: 10.1109/TGRS.2022.3166252 http://dx.doi.org/10.1109/TGRS.2022.3166252 ]
Ma W P , Guo Y S , Zhu H , Yi X Y , Zhao W H , Wu Y , Hou B and Jiao L C . 2024 . Intra- and intersource interactive representation learning network for remote sensing images classification . IEEE Transactions on Geoscience and Remote Sensing , 62 : # 5401515 [ DOI: 10.1109/TGRS.2024.3352816 http://dx.doi.org/10.1109/TGRS.2024.3352816 ]
Man Q X , Dong P L and Guo H D . 2015 . Pixel- and feature-level fusion of hyperspectral and LiDAR data for urban land-use classification . International Journal of Remote Sensing , 36 ( 6 ): 1618 - 1644 [ DOI: 10.1080/01431161.2015.1015657 http://dx.doi.org/10.1080/01431161.2015.1015657 ]
Melgani F and Bruzzone L . 2004 . Classification of hyperspectral remote sensing images with support vector machines . IEEE Transactions on Geoscience and Remote Sensing , 42 ( 8 ): 1778 - 1790 [ DOI: 10.1109/TGRS.2004.831865 http://dx.doi.org/10.1109/TGRS.2004.831865 ]
Moreira A , Prats-Iraola P , Younis M , Krieger G , Hajnsek I and Papathanassiou K P . 2013 . A tutorial on synthetic aperture radar . IEEE Geoscience and Remote Sensing Magazine , 1 ( 1 ): 6 - 43 [ DOI: 10.1109/MGRS.2013.2248301 http://dx.doi.org/10.1109/MGRS.2013.2248301 ]
Roy S K , Sukul A , Jamali A , Haut J M and Ghamisi P . 2024 . Cross hyperspectral and LiDAR attention Transformer: an extended self-attention for land use and land cover classification . IEEE Transactions on Geoscience and Remote Sensing , 62 : # 5512815 [ DOI: 10.1109/TGRS.2024.3374324 http://dx.doi.org/10.1109/TGRS.2024.3374324 ]
Wang J J , Gao F , Dong J Y and Du Q . 2021 . Adaptive DropBlock-enhanced generative adversarial networks for hyperspectral image classification . IEEE Transactions on Geoscience and Remote Sensing , 59 ( 6 ): 5040 - 5053 [ DOI: 10.1109/TGRS.2020.3015843 http://dx.doi.org/10.1109/TGRS.2020.3015843 ]
Xiao Z J , Lin B H and Qu H C . 2024 . SAR ship detection with multi-mechanism fusion . Journal of Image and Graphics , 29 ( 2 ): 545 - 558
肖振久 , 林渤翰 , 曲海成 . 2024 . 融合多重机制的SAR舰船检测 . 中国图象图形学报 , 29 ( 2 ): 545 - 558 [ DOI: 10.11834/jig.230166 http://dx.doi.org/10.11834/jig.230166 ]
Xu X D , Li W , Ran Q , Du Q , Gao L R and Zhang B . 2018 . Multisource remote sensing data classification based on convolutional neural network . IEEE Transactions on Geoscience and Remote Sensing , 56 ( 2 ): 937 - 949 [ DOI: 10.1109/TGRS.2017.2756851 http://dx.doi.org/10.1109/TGRS.2017.2756851 ]
Yao J , Zhang B , Li C Y , Hong D F and Chanussot J . 2023 . Extended Vision Transformer (ExViT) for land use and land cover classification: a multimodal deep learning framework . IEEE Transactions on Geoscience and Remote Sensing , 61 : # 5514415 [ DOI: 10.1109/TGRS.2023.3284671 http://dx.doi.org/10.1109/TGRS.2023.3284671 ]
Zhao G R , Ye Q L , Sun L , Wu Z B , Pan C S and Jeon B . 2023 . Joint classification of hyperspectral and LiDAR data using a hierarchical CNN and Transformer . IEEE Transactions on Geoscience and Remote Sensing , 61 : # 5500716 [ DOI: 10.1109/TGRS.2022.3232498 http://dx.doi.org/10.1109/TGRS.2022.3232498 ]
Zhao X D , Tao R , Li W , Li H C , Du Q , Liao W Z and Philips W . 2020 . Joint classification of hyperspectral and LiDAR data using hierarchical random walk and deep CNN architecture . IEEE Transactions on Geoscience and Remote Sensing , 58 ( 10 ): 7355 - 7370 [ DOI: 10.1109/TGRS.2020.2982064 http://dx.doi.org/10.1109/TGRS.2020.2982064 ]
相关作者
相关机构