加强类别关系的农作物遥感图像语义分割

董荣胜; 马雨琪; 刘意; 李凤英

doi:10.11834/jig.210760

遥感图像处理 | 浏览量 : 0 下载量: 0 CSCD: 1

PDF
导出
分享
收藏
专辑

加强类别关系的农作物遥感图像语义分割
CRNet: class relation network for crop remote sensing image semantic segmentation
2022年27卷第11期页码：3382-3394
纸质出版日期： 2022-11-16 ，

录用日期： 2021-11-15
DOI： 10.11834/jig.210760
稿件说明：

移动端阅览

董荣胜, 马雨琪, 刘意, 李凤英. 加强类别关系的农作物遥感图像语义分割[J]. 中国图象图形学报, 2022,27(11):3382-3394.

Rongsheng Dong, Yuqi Ma, Yi Liu, Fengying Li. CRNet: class relation network for crop remote sensing image semantic segmentation[J]. Journal of Image and Graphics, 2022,27(11):3382-3394.
董荣胜, 马雨琪, 刘意, 李凤英. 加强类别关系的农作物遥感图像语义分割[J]. 中国图象图形学报, 2022,27(11):3382-3394. DOI： 10.11834/jig.210760.

Rongsheng Dong, Yuqi Ma, Yi Liu, Fengying Li. CRNet: class relation network for crop remote sensing image semantic segmentation[J]. Journal of Image and Graphics, 2022,27(11):3382-3394. DOI： 10.11834/jig.210760.

摘要

目的

遥感图像处理技术在农作物规划、植被检测以及农用地监测等方面具有重要的作用。然而农作物遥感图像上存在类别不平衡的问题，部分样本中农作物类间相似度高、类内差异性大，使得农作物遥感图像的语义分割更具挑战性。为了解决这些问题，提出一种融合不同尺度类别关系的农作物遥感图像语义分割网络CRNet（class relation network）。

方法

该网络将ResNet-34作为编码器的主干网络提取图像特征，并采用特征金字塔结构融合高阶语义特征和低阶空间信息，增强网络对图像细节的处理能力。引入类别关系模块获取不同尺度的类别关系，利用一种新的类别特征加强注意力机制（class feature enhancement，CFE）结合通道注意力和加强位置信息的空间注意力，使得农作物类间的语义差异和农作物类内的相关性增大。在解码器中，将不同尺度的类别关系融合，增强了网络对不同尺度农作物特征的识别能力，从而提高了对农作物边界分割的精度。通过数据预处理、数据增强和类别平衡损失函数（class-balanced loss，CB loss）进一步缓解了农作物遥感图像中类别不平衡的问题。

结果

在Barley Remote Sensing数据集上进行的实验表明，CRNet网络的平均交并比（mean intersection over union，MIoU）和总体分类精度（overall accuracy，OA）分别达到68.89%和82.59%，性能在评价指标和可视化效果上均优于PSPNet（pyramid scene parsing network）、FPN（feature pyramid network）、LinkNet、DeepLabv3+、FarSeg（foreground-aware relation network）以及STLNet（statistical texture learning network）。

结论

CRNet网络通过类别关系模块，在遥感图像复杂的地物背景中更加精准地区分相似的不同农作物，识别特征差异大的同种农作物，并融合多级特征使得提取出的目标边界更加清晰完整，提高了分割精度。

Abstract

Objective

Remote sensing based image processing technology plays an important role in crop planning

vegetation detection and agricultural land detection. The purpose of crop-relevant remote sensing image semantic segmentation is to classify the crop-relevant remote sensing image at pixel level and segment the image into regions with different semantic identification. The semantic segmentation of crop-relevant remote sensing image has been challenging in contrast to natural scene on the two aspects: 1) the number of samples of different categories varies greatly and the distribution is extremely unbalanced. For example

there are much more background-related samples with less samples remaining. The following overfitting and poor robustness problems are appeared for network training. 2) The similarity of appearance features of different crops is presented higher

which makes it difficult to distinguish similar appearance for the network

while the appearance features of the same crop are different

which could cause misclassify the same crop. We develop a semantic segmentation network called class relation network (CRNet) for crop-relevant remote sensing image

which integrates multiple scale class relations. Our experimental data is carried out on Barley Remote Sensing Dataset derived from the Tianchi Big Data Competition. Since the dataset consists of 4 large-size high-resolution remote sensing images

it cannot be as an input to a neural network. First

it is necessary to process the image and cut it into many sub-graphs of 512×512 pixels. Next

there are 11 750 sub-graphs in the dataset after cutting

including 9 413 images in the training set and 2 337 images in the test set. The ratio of the training set is about 4:1 to the test set.

Method

Our CRNet is composed of three parts like variant of feature pyramid network encoder

category relation module and decoder. 1) In the encoder

ResNet-34 is used as the backbone network to extract the image features from bottom to top gradually

which can process image details better. Similar to the original feature pyramid structure (from top to bottom)

horizontal links are used to fuse high-level semantic features and low-level spatial information. 2) The category relation module consists of three layers of paralleled structure. After the features of the three layers outputted by the encoder pass through the 1×1 convolution layer

the channel dimension is reduced to 5. The 1×1 convolutional layer here can be regarded as a classifier that maps global features into 5 channels

corresponding to the classification category

and each channel can represent features of a targeted category. Then

the feature map of each layer is input into the category feature enhancement(CFE) attention mechanism. The CFE attention module is segmented to channel-based and spatial-relevant. Assigned weights for each category is conducted by learning the correlation between the features of each channel. To clarify the features between different categories

the channel attention mechanism is focused on strengthening the strong-correlated features and suppressing the weak-correlated features. The channel information is encoded in the spatial dimension through global average pooling and global max pooling

and the global context information is modeled to obtain the global features of each channel. The spatial attention module enhances the location information of crops

such as the sites of crops in the farmland. Each location is connected with the horizontal or vertical direction in the feature image via learning the spatial information in the horizontal and vertical directions. The CFE attention module can obtain more distinct features in different categories. The feature differences are identified further between multiple crops. At the same time

more context information is improved for the feature of the same category

which aids to reduce the misclassification of the same crop. 3) In the decoder

the classification relations of different scales are fused and restored to the initial resolution

and the final classification is carried out by fully combining the feature information of each scale. In addition

we use data enhancement to reduce the proportion of background samples and expand the number of samples of other categories. To further alleviate the problem of class imbalance in crop-relevant remote sensing images

a class-balanced loss (CB loss) function is introduced.

Result

To verify the effectiveness of the CRNet

our training model is tested on Barley Remote Sensing dataset

and the mean intersection over union (MIoU) is 68.89%

and the overall accuracy (OA) is 82.59%. Our CRNet is increased by 7.42%

4.86%

4.57%

4.36%

4.05%

and 3.63% respectively in MIoU in contrast to the Linknet

pyramid scene parsing network (PSPNet)

DeepLabv3+

foreground-aware relation network (FarSeg)

statistical texture learning network (STLNet) and feature pyramid network(FPN)

and our OA is improved by 4.35%

2.6%

3.01%

2.5%

2.45% and 1.85% of each. The number of parameters and inference speed of CRNet are reached to 21.98 MB and 68 frames/s. Compared to LinkNet and FPN

its number of parameters and inference speed are increased

which are 7.42% and 4.35% higher than LinkNet

3.63% and 1.85% higher than FPN in MIoU and OA.

Conclusion

In the combination of multi-level features and the introduction of category relation module

our CRNet network can distinguish the similar crops more accurately. The same crops are sorted out in the complex ground object background of remote sensing image. The completed target boundary can be extracted more. The experiment shows that our CRNet has its priority for crop-relevant semantic segmentation methods.

关键词

农作物遥感图像语义分割类别关系模块注意力机制类别平衡损失函数(CB loss)Barley Remote Sensing数据集

Keywords

crop remote sensing imagesemantic segmentationcategory relation moduleattention mechanismclass-balanced loss(CB loss)Barley Remote Sensing dataset

references

Adams R and Bischof L. 1994. Seeded region growing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(6): 641-647[DOI: 10.1109/34.295913]

Badrinarayanan V, Kendall A and Cipolla R. 2017. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12): 2481-2495[DOI: 10.1109/TPAMI.2016.2644615]

Chaurasia A and Culurciello E. 2017. LinkNet: exploiting encoder representations for efficient semantic segmentation//Proceedings of 2017 IEEE Visual Communications and Image Processing (VCIP). St. Petersburg, USA: IEEE: 1-4[DOI:10.1109/VCIP.2017.8305148http://dx.doi.org/10.1109/VCIP.2017.8305148]

Chen L C, Zhu Y K, Papandreou G, Schroff F and Adam H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 833-851[DOI:10.1007/978-3-030-01234-2_49http://dx.doi.org/10.1007/978-3-030-01234-2_49]

Cui Y, Jia M L, Lin T Y, Song Y and Belongie S. 2019. Class-balanced loss based on effective number of samples//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 9260-9269[DOI:10.1109/CVPR.2019.00949http://dx.doi.org/10.1109/CVPR.2019.00949]

Glorot X, Bordes A and Bengio Y. 2011. Deep sparse rectifier neural networks//Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA: JMLR: 315-323

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI:10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

Hou H Y, Gao T and Li T. 2019. A survey of Image segmentation. Computer Knowledge and Technology, (5): 176-177

侯红英, 高甜, 李桃. 2019. 图像分割方法综述. 电脑知识与技术, (5): 176-177[DOI: 10.14004/j.cnki.ckt.2019.0432]

Hou Q B, Zhou D Q and Feng J S. 2021. Coordinate attention for efficient mobile network design//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 13708-13717[DOI:10.1109/CVPR46437.2021.01350http://dx.doi.org/10.1109/CVPR46437.2021.01350]

Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141[DOI:10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745]

Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift[EB/OL]. [2021-08-20].https://arxiv.org/pdf/1502.03167.pdfhttps://arxiv.org/pdf/1502.03167.pdf

Lian C. 2021. Remote sensing technology makes agricultural production smarter. China Agri-Production News, (6): #15

炼晨. 2021. 遥感技术让农业生产更"智慧". 中国农资, (6): #15

Lin T Y, Dollár P, Girshick R, He K M, Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 936-944[DOI:10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106]

Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2020. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2): 318-327[DOI:10.1109/TPAMI.2018.2858826http://dx.doi.org/10.1109/TPAMI.2018.2858826]

Liu S. 2020. A survey of the development of threshold segmentation technology. Technology Innovation and Application, (24): 129-130

刘硕. 2020. 阈值分割技术发展现状综述. 科技创新与应用, (24): 129-130

Otsu N. 1979. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1): 62-66[DOI: 10.1109/TSMC.1979.4310076]

Paszke A, Chaurasia A, Kim S and Culurciello E. 2016. ENet: a deep neural network architecture for real-time semantic segmentation[EB/OL]. [2021-08-20].https://arxiv.org/pdf/1606.02147v1.pdfhttps://arxiv.org/pdf/1606.02147v1.pdf

Pena J, Tan Y M and Boonpook W. 2019. Semantic segmentation based remote sensing data fusion on crops detection. Journal of Computer and Communications, 7(7): 53-64[DOI: 10.4236/jcc.2019.77006]

Prewitt J M S. 1970. Object Enhancement and Extraction Picture Processing and Psychopictorics. New York: Academic Press

Roberts L G. 1963. Machine Perception of Three-Dimensional Solids. Cambridge: Massachusetts Institute of Technology

Ronneberger O, Fischer P and Brox T. 2015. U-net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241[DOI:10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]

Shelhamer E, Long J and Darrell T. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysisand Machine Intelligence, 39(4): 640-651[DOI: 10.1109/TPAMI.2016.2572683]

Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2014. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 1-9[DOI:10.1109/CVPR.2015.7298594http://dx.doi.org/10.1109/CVPR.2015.7298594]

Tan M X and Le Q V. 2020. EfficientNet: rethinking model scaling for convolutional neural networks[EB/OL]. [2021-08-20].https://arxiv.org/pdf/1905.11946.pdfhttps://arxiv.org/pdf/1905.11946.pdf

Wang Q P, Zhang Z X and Zhu X F. 2019. Image segmentation: a survey. Information Recording Materials, 20(7): 12-14

王秋萍, 张志祥, 朱旭芳. 2019. 图像分割方法综述. 信息记录材料, 20(7): 12-14[DOI: 10.16009/j.cnki.cn13-1295/tq.2019.07.005]

Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 3-19[DOI:10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1]

Xu X L. 2020. Semantic Segmentation of Remote Sensing Images Based on Deep Learning. Lanzhou: Lanzhou University

徐馨兰. 2020. 基于深度学习的遥感影像语义分割应用. 兰州: 兰州大学[DOI:10.27204/d.cnki.glzhu.2020.002547http://dx.doi.org/10.27204/d.cnki.glzhu.2020.002547]

Zhang S Q. 2020. Crop Classification and Identification Based on UAV Remote Sensing Image with Deep Learning. Chengdu: Chengdu University of Technology

张诗琪. 2020. 基于深度学习的无人机遥感影像农作物分类识别. 成都: 成都理工大学[DOI:10.26986/d.cnki.gcdlc.2020.000645http://dx.doi.org/10.26986/d.cnki.gcdlc.2020.000645]

Zhao H S, Shi JP, Qi X J, Wang X G and Jia J Y. 2017. Pyramid scene parsing network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE Computer Society (CVPR): 6230-6239[DOI:10.1109/CVPR.2017.660http://dx.doi.org/10.1109/CVPR.2017.660]

Zheng Z, Zhong Y F, Wang J J and Ma A L. 2020. Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 4095-4104[DOI: 10.1109/CVPR42600.2020.00415http://dx.doi.org/10.1109/CVPR42600.2020.00415]

Zhu L Y, Ji D Y, Zhu S P, Gan W H, Wu W and Yan J J. 2021. Learning statistical texture for semantic segmentation[EB/OL]. [2021-08-20].https://arxiv.org/pdf/2103.04133.pdfhttps://arxiv.org/pdf/2103.04133.pdf

文章被引用时，请邮件提醒。

提交