引入全局感知与细节增强的非对称遥感建筑物分割网络
Global perception and detail enhancement network for building segmentation in remote sensing image
- 2025年 页码:1-18
网络出版日期: 2025-01-23 ,
录用日期: 2025-01-16
DOI: 10.11834/jig.240629
移动端阅览
浏览全部资源
扫码关注微信
网络出版日期: 2025-01-23 ,
录用日期: 2025-01-16
移动端阅览
徐胜军, 刘雨芮, 刘二虎, 刘俊, 史亚, 李小晗. 引入全局感知与细节增强的非对称遥感建筑物分割网络[J/OL]. 中国图象图形学报, 2025,1-18.
XU SHENGJUN, LIU YURUI, LIU ERHU, LIU JUN, SHI YA, LI XIAOHAN. Global perception and detail enhancement network for building segmentation in remote sensing image. [J/OL]. Journal of image and graphics, 2025, 1-18.
目的
2
针对遥感图像分割的区域连续性差、边界消失和尺度变化大等导致建筑物分割精度低的问题,提出了一种基于全局感知与细节增强的非对称遥感建筑物分割网络(Global Perception and Detail Enhancement Asymmetric-UNet,GPDEA-UNet)。
方法
2
该网络在U-Net网络基础上,首先构建了一个基于选择性状态空间的特征编码器模块,以视觉状态空间(Visual state space,VSS)作为基础单元,结合动态卷积分解(Dynamic convolution decomposition,DCD)捕捉遥感图像中的复杂特征和上下文信息;其次通过引入多尺度双交叉融合注意力模块(Multi-scale dual cross-attention,MDCA)解决多尺度编码器特征间的通道与空间依赖性问题,并缩小编解码器特征之间的语义差距;最后设计了一个细节增强解码器模块,使用DCD与级联上采样模块(Cascade upsampling,CU)恢复更丰富的语义信息,保留特征细节与语义完整,最终确保分割结果的精确性与细腻度。
结果
2
实验在WHU Building Dataset和Massachusetts Building Dataset数据集上与多种方法进行了比较,实验结果表明,所提出的网络GPDEA-UNet在WHU Building Dataset数据集上IoU、Precision、Recall、F1-score分别可达91.60%,95.36%,95.89%,95.62%;在Massachusetts Building Dataset数据集上IoU、Precision、Recall、F1-score分别可达到72.51%,79.44%,86.81%,82.53%。
结论
2
所提出的基于全局感知与细节增强的非对称遥感建筑物分割网络,可以有效提高遥感影像建筑物的分割精度。
Objective
2
Remote sensing image is a kind of earth observation data characterized by wide coverage, rich spectral information and variable target structures. With the advancement of computer technology, the demand for accurate and efficient extraction of buildings across diverse domains and industries has been steadily increasing. Meanwhile, the application prospects of semantic segmentation techniques in remote sensing image have progressively demonstrated substantial practical significance. By utilizing the semantic segmentation technology for remote sensing images, detailed information such as the spatial distribution and density of buildings and other infrastructures can be efficiently extracted. This information will play a crucial role in land surveying, urban planning and post-disaster assessments. However, this advancement has simultaneously increased the complexity of semantic segmentation for buildings in remote sensing images. Consequently, the challenge of efficiently and accurately extracting building information from high-resolution imagery has emerged as a pivotal concern in the field of semantic segmentation of remote sensing images, demanding urgent attention and resolution. In recent years, deep learning has achieved notable advancements in the field of semantic segmentation of remote sensing images, owing to its ability to learn any data distribution without requiring prior statistical knowledge of the input data, its capacity for self-learning target features, and its strong generalization capabilities. However, the process of semantic segmentation for remote sensing images of buildings faces substantial obstacles, primarily due to robust interferences such as varying lighting conditions, seasonal changes, complex background information, as well as the intricate architectural structures and edge details of the buildings themselves. To address these challenges, this thesis proposes a global perception and detail enhancement asymmetric-UNet(GPDEA-UNet)network for remote sensing building semantics segmentation network.
Method
2
First, based on the UNet architecture, the proposed network constructs a feature encoder module based on selective state space module. This module is specifically designed to meticulously extract the texture, boundary, and deep semantic features of buildings in remote sensing image. It leverages the visual state space(VSS)as its fundamental building block and incorporates dynamic convolution decomposition(DCD)to significantly enhance the extraction of intricate features and context information in the remote sensing images, while effectively reducing computational overhead. Secondly, to further broaden the network's global receptive field and tackle the semantic discrepancy challenges posed by the codec during skip connections, a multi-scale dual cross-attention(MDCA)module is introduced. MDCA represents an advanced attention-weighting mechanism that harmoniously integrates cross-channel attention(CCA)and cross-spatial attention(CSA). This module substantially enhances the network's capability to extract and fuse feature information pertinent to the segmented target's region and boundary, while effectively resolving the interdependencies among multi-scale encoder features in both channel and spatial dimensions, thereby bridging the semantic gap between encoder and decoder features. Finally, in order to address the issue of image detail information loss during the upsampling phase, a detail enhancement decoder module is designed to restore the resolution of the extracted feature maps. This module builds upon the principles of DCD and incorporates a cascade upsampling module(CU). The CU is specifically engineered to capture richer semantic information, retain feature details and semantic integrity, and ultimately ensure the high accuracy and delicate precision of the segmentation results. By integrating these sophisticated components, our network achieves a highly specialized and nuanced segmentation of remote sensing building images.
Result
2
Experimental results demonstrate the exceptional robustness of the GPDEA-UNet network, introduced in this paper, across various datasets. Specifically, on the WHU building dataset, the network achieved an intersection over union (IoU) of 91.60%, precision of 95.36%, recall of 95.89%, and an F1-score of 95.62%. Similarly, on the Massachusetts building dataset, the network attained an IoU of 73.51%, precision of 79.44%, recall of 86.81%, and an F1-score of 82.53%. When compared to other state-of-the-art networks, the quantitative indicators reveal that the GPDEA-UNet network attains optimal performance on the WHU dataset and either optimal or near-optimal performance on the Massachusetts dataset. Furthermore, qualitative analysis demonstrates that the proposed network achieves superior segmentation results on both the WHU and Massachusetts datasets. The network maintains high-quality segmentation even for remote sensing images with inferior imaging quality, such as those with low resolution, noise, or occlusion.
Conclusion
2
Combining selective state space module and multi-scale dual cross-attention mechanism, an asymmetric remote sensing building segmentation network with global perception and detail enhancement is proposed. Experiments on two remote sensing datasets, the proposed network can effectively improve the accuracy and visualization effects of remote sensing building segmentation. Furthermore, the network exhibits remarkable robustness and versatility. The high precision and recall rates achieved in our experiments underscore its capability to excel not only in high-quality remote sensing building segmentation but also in challenging scenarios. This paper shows that the proposed network has excellent universality and application potential in remote sensing image segmentation and provides a new research idea and method for the research and application of remote sensing image processing.
遥感图像建筑物分割视觉状态空间动态卷积分解交叉注意力细节增强
remote sensing imagesbuilding segmentationvisual state spacedynamic convolution decompositioncross-attentiondetail enhancement
Albert Gu, Karan Goel, and Christopher Re. 2021. Efficiently modeling long sequences with structured state spaces[J]. arxiv preprint arxiv:2111.00396. [DOI: 10.48550/arXiv.2111.00396http://dx.doi.org/10.48550/arXiv.2111.00396]
Aleissaee A, Kumar A, Anwer R M, Khan S, Cholakkal H and Xia G S. 2023.Transformers in remote sensing: A survey[J]. Remote Sensing, 15(7): 1860. [DOI: 10.3390/rs15071860http://dx.doi.org/10.3390/rs15071860]
Badrinarayanan V, Kendall A and Cipolla R. 2017. Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation[J]. IEEE transactions on pattern analysis machine intelligence, 39(12): 2481-2495. [DOI: 10.1109/TPAM1.2016.2644615http://dx.doi.org/10.1109/TPAM1.2016.2644615]
Chen K, Zou Z and Shi Z. 2021. Building extraction from remote sensing images with sparse token transformers[J]. Remote Sensing, 13(21): 4441. [DOI:10.3390/rs13214441http://dx.doi.org/10.3390/rs13214441]
Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2017. Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs[J]. IEEE transactions on pattern analysis machine intelligence, 40(4): 834-848. [DOI: 10.1109/TPAMI.2017.2699184http://dx.doi.org/10.1109/TPAMI.2017.2699184]
Chen M, Wu J, Liu L, Zhao W, Tian F, Shen Q, Zhao B and Du R. 2021. DR-Net: An improved network for building extraction from high resolution remote sensing image[J]. Remote Sensing, 13(2): 294. [DOI: 10.3390/rs13020294http://dx.doi.org/10.3390/rs13020294]
Chen M, Wu J, Liu L, Zhao W, Tian F, Shen Q, Zhao B and Du R. 2021. DR-Net: An improved network for building extraction from high resolution remote sensing image[J]. Remote Sensing, 13(2): 294. [DOI:10.3390/rs13020294http://dx.doi.org/10.3390/rs13020294]
Chen Z, Li D, Fan W, Guan H, Wang C and Li G. 2021. Self-attention in reconstruction bias U-Net for semantic segmentation of building rooftops in optical remote sensing images[J]. Remote Sensing, 13(13): 2524. [DOI: 10.3390/rs13132524http://dx.doi.org/10.3390/rs13132524]
Diakogiannis F I, Waldner F, Caccetta P and Wu C. 2020. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 162: 94-114. [DOI: 10.1016/j.isprsjprs.2020.01.013http://dx.doi.org/10.1016/j.isprsjprs.2020.01.013]
Ding L, Tang H and Bruzzone L. 2020. LANet: Local attention embedding to improve the semantic segmentation of remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 59(1): 426-435. [DOI: 10.1109/TGRS.2020.2994150http://dx.doi.org/10.1109/TGRS.2020.2994150]
Dong J, Mao J K, Liu K and Cheng L Y. 2024. Remote sensing image building segmentation based on edge attention. Journal of Tianjin University of Technology, 1- 8[2024-11-17]
董杰,毛经坤,刘坤, 程良勇.2024.基于边缘注意的遥感图像建筑物分割.天津理工大学学报,1-8[2024-11-17]. [https://link.cnki.net/urlid/12.1374.n.20240514.1358.016https://link.cnki.net/urlid/12.1374.n.20240514.1358.016]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N.2021. An image is worth 16×16 words: Transformers for image recognition at scale [EB/OL].[2023-07-19]. https://arxiv.org/pdf/2010.11929v1.pdfhttps://arxiv.org/pdf/2010.11929v1.pdf
Ghaffarian S, Valente J, Van Der Voort M and Tekinerdogan B. 2021. Effect of attention mechanism in deep learning-based remote sensing image processing: A systematic literature review[J]. Remote Sensing, 13(15): 2965. [DOI: 10.3390/rs13152965http://dx.doi.org/10.3390/rs13152965]
Gu A and Dao T. 2023. Mamba: Linear-time sequence modeling with selective state spaces[J]. [DOI: 10.48550/arXiv.2312.00752]
Li E, Femiani J, Xu S, Zhang X, and Wonka P. 2015. Robust rooftop extraction from visible band images using higher order CRF[J]. IEEE Transactions on Geoscience and Remote Sensing, 53(8): 4483-4495. [DOI: 10.1109/TGRS.2015.2400462http://dx.doi.org/10.1109/TGRS.2015.2400462]
Li H, Qiu K, Chen L, Mei X, Hong L and Tao C. 2020. SCAttNet: Semantic segmentation network with spatial and channel attention mechanism for high-resolution remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 18(5): 905-909. [DOI: 10.1109/LGRS.2020.2988294http://dx.doi.org/10.1109/LGRS.2020.2988294]
Li L J, He Y, Xie G, Zhang H X and Bai Y H. 2024. Cross-layer detail perception and group attention-guided semantic segmentation network for remote sensing images. Journal of Image and Graphics,29(05):1277-1290
李林娟,贺赟,谢刚,张浩雪,柏艳红.2024.跨层细节感知和分组注意力引导的遥感图像语义分割.中国图象图形学报,29(05):1277-1290. [DOI: 10.11834/jig.230653http://dx.doi.org/10.11834/jig.230653]
Li Q, Mou L, Sun Y, Hua Y, Shi Y, and Zhu X X. 2024. A Review of Building Extraction from Remote Sensing Imagery: Geometrical Structures and Semantic Attributes[J]. IEEE Transactions on Geoscience and Remote Sensing. [DOI: 10.1109/TGRS.2024.3369723http://dx.doi.org/10.1109/TGRS.2024.3369723]
Lin G, Milan A, Shen C, and Reid I. 2017. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 1925-1934. [DOI: 10.48550/arXiv.1611.06612http://dx.doi.org/10.48550/arXiv.1611.06612]
Long J, Shelhamer E and Darrell T. 2015. Fully Convolutional Networks for Semantic Segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Boston, Massachusetts, USA: IEEE, 3431-3440. [DOI: 10.1109/LGRS.2018.2795531http://dx.doi.org/10.1109/LGRS.2018.2795531]
Qiu Y, Wu F, Yin J, Liu C, Gong X and Wang An. 2022. MSL-Net: An efficient network for building extraction from aerial imagery[J]. Remote Sensing, 14(16): 3914. [DOI: 10.3390/rs14163914http://dx.doi.org/10.3390/rs14163914]
Ronneberger O, Fischer P and Brox T. 2015. U-net: Convolutional networks for biomedical image segmentation[C]//Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer International Publishing, 234-241. [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Shao Z, Tang P, Wang Z, Saleem N, Yam M and Sommai C. 2020. BRRNet: A fully convolutional neural network for automatic building extraction from high-resolution remote sensing images[J]. Remote Sensing, 12(6): 1050. [DOI: 10.3390/rs12061050http://dx.doi.org/10.3390/rs12061050]
Sun K, Zhao Y, Jiang B, Cheng T, Xiao B, Liu D, Mu Y, Wang X, Liu W and Wang J. 2019. High-Resolution Representations for Labeling Pixels and Regions[J]. [DOI: 10.48550/arXiv.1904.04514]
Thottolil R and Kumar U. 2022. Automatic building footprint extraction using random forest algorithm from high resolution google earth images: A feature-based approach[C]//2022 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT). IEEE, 1-6. [DOI: 10.1109/CONECCT55679.2022.9865829http://dx.doi.org/10.1109/CONECCT55679.2022.9865829]
Wang L, Li R, Zhang C, Fang S, Duan C, Meng X and Atkinson P M. 2022. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 190: 196-214. [DOI: 10.1016/j.isprsjprs.2022.06.008http://dx.doi.org/10.1016/j.isprsjprs.2022.06.008]
Wang Z and Qu S J. 2024. Research progress and challenges in real-time semantic segmentation for deep learning. Journal of Image and Graphics,29(05):1188-1220
王卓,瞿绍军.2024.深度学习实时语义分割研究进展和挑战.中国图象图形学报,29(05):1188-1220. [DOI: 10.11834/jig.230605http://dx.doi.org/10.11834/jig.230605]
Xiang W K, Zhou Q, Cui J C, Mo Z Y, Wu X F, Ou W H, Wang J D and Liu W Y. 2024. Weakly supervised semantic segmentation based on deep learning. Journal of Image and Graphics,29(05):1146-1168
项伟康,周全,崔景程,莫智懿,吴晓富,欧卫华,王井东,刘文予.2024.基于深度学习的弱监督语义分割方法综述.中国图象图形学报,29(05):1146-1168. [DOI: 10.11834/jig.230628http://dx.doi.org/10.11834/jig.230628]
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez J M and Luo P. 2021. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers[J]. arXiv preprint arXiv:2105.15203. [DOI: 10.48550/arXiv.2105.15203http://dx.doi.org/10.48550/arXiv.2105.15203]
Xu S, Deng B, Meng Y, Liu G and Han J. 2022. ReA-Net: A Multiscale Region Attention Network With Neighborhood Consistency Supervision for Building Extraction From Remote Sensing Image[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15: 9033-9047. [DOI: 10.1109/JSTARS.2022.3204576http://dx.doi.org/10.1109/JSTARS.2022.3204576]
Xu S, Du M, Meng Y, Liu G, Han J and Zhan B. 2023. MDBES-Net: Building Extraction From Remote Sensing Images Based on Multiscale Decoupled Body and Edge Supervision Network. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17: 519-534. [DOI: 10.1109/jstars.2023.3331444http://dx.doi.org/10.1109/jstars.2023.3331444]
Yang J H, Zhang H and Hua H Y. 2023. Parallel path and strong attention mechanism for building segmentation in remote sensing images. Optics and Precision Engineering, 31(02):234-245
杨坚华,张浩,花海洋.2023.并行路径与强注意力机制遥感图像建筑物分割. 光学精密工程,31(02):234-245. [DOI: 10.37188/OPE.20233102.0234http://dx.doi.org/10.37188/OPE.20233102.0234]
Yang X, Li S, Chen Z, Chanussot J, Jia X, Zhang B, Li B and Chen P. 2021. An attention-fused network for semantic segmentation of very-high-resolution remote sensing imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 177: 238-262. [DOI: 10.1016/j.isprsjprs.2021.05.004http://dx.doi.org/10.1016/j.isprsjprs.2021.05.004]
Zeng X, Chen I and Liu P. 2021. Improve semantic segmentation of remote sensing images with K-mean pixel clustering: A semantic segmentation post-processing method based on k-means clustering[C]//2021 IEEE International Conference on Computer Science, Artificial Intelligence and Electronic Engineering (CSAIEE). IEEE, 231-235. [DOI: 10.1109/CSAIEE54046.2021.9543336http://dx.doi.org/10.1109/CSAIEE54046.2021.9543336]
Zhao H, Shi J, Qi X, Wang X and Jia J. 2017. Pyramid Scene Parsing Network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu, HI, USA: IEEE, 2881-2890. [DOI: 10.48550/arXiv.1612.01105http://dx.doi.org/10.48550/arXiv.1612.01105]
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr P H S and Zhang L. 2021. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 6881-6890. [DOI: 10.48550/arXiv.2012.15840http://dx.doi.org/10.48550/arXiv.2012.15840]
Zhu L, Liao B, Zhang Q, Wang X, Liu W and Wang X. 2024. Vision mamba: Efficient visual representation learning with bidirectional state space model[J]. [DOI: 10.48550/arXiv.2401.09417]
Zhu Z M, Li S D, Zheng D B and Xue P S. 2024. RH-CUnet: Extracting Traditional Village Buildings by Embedding Edges and Corners, 39(04):166-173
朱梓萌,李少丹,郑东博,薛彭帅. 2024. RH-CUnet:嵌入边缘和角点的传统村落建筑物提取. 遥感信息,39(04):166-173. [DOI: 10.20091/j.cnki.1000-3177.2024.04.018http://dx.doi.org/10.20091/j.cnki.1000-3177.2024.04.018]
相关作者
相关机构