显著性引导的目标互补隐藏弱监督语义分割
Saliency guided object complementary hiding for weakly supervised semantic segmentation
- 2024年29卷第4期 页码:1041-1055
纸质出版日期: 2024-04-16
DOI: 10.11834/jig.230156
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-04-16 ,
移动端阅览
白雪飞, 卢立彬, 王文剑. 2024. 显著性引导的目标互补隐藏弱监督语义分割. 中国图象图形学报, 29(04):1041-1055
Bai Xuefei, Lu Libin, Wang Wenjian. 2024. Saliency guided object complementary hiding for weakly supervised semantic segmentation. Journal of Image and Graphics, 29(04):1041-1055
目的
2
图像级弱监督语义分割方法利用类别标签训练分割网络,可显著降低标注成本。现有方法大多采用类激活图定位目标物体,然而传统类激活图只能挖掘出物体中最具辨识性的区域,直接将其作为伪标签训练的分割网络精度较差。本文提出一种显著性引导的弱监督语义分割算法,可在获取更完整类激活图的基础上提高分割模型的性能。
方法
2
首先通过显著图对目标进行互补随机隐藏,以获得互补图像对,然后融合互补图像对的类激活图作为监督,提高网络获取完整类激活图的能力。其次引入双重注意力修正模块,利用全局信息修正类激活图并生成伪标签训练分割网络。最后使用标签迭代精调策略,结合分割网络的初始预测、类激活图以及显著图生成更精确的伪标签,迭代训练分割网络。
结果
2
在PASCAL VOC 2012(pattern analysis,statistical modeling and computational learning visual object classes 2012)数据集上进行类激活图生成实验与语义分割实验,所生成的类激活图更加完整,平均交并比有10.21%的提升。语义分割结果均优于对比方法,平均交并比提升6.9%。此外在COCO 2014(common object in context 2014)数据集上进行了多目标的语义分割实验,平均交并比提升0.5%。
结论
2
该算法可获得更完整的类激活图,缓解了弱监督语义分割中监督信息不足的问题,提升了弱监督语义分割模型的精度。
Objective
2
The fully supervised semantic segmentation method based on deep learning has made remarkable progress, promoting practical applications such as automatic driving and medical image analysis. However, the fully supervised semantic segmentation method depends on the complete pixel-wise annotation, and the construction of large-scale pixel-wise annotation datasets requires a considerable amount of human labor and resources. Researchers have recently attempted to study semantic segmentation based on convenient supervisions, such as bounding boxes, scribbles, points, and image-level labels, to reduce the reliance on accurate annotations. Weakly supervised semantic segmentation based on image-level labels only uses category labels to train the segmentation network, which can significantly reduce the annotation cost. Most of the existing weakly supervised semantic segmentation methods use class activation map (CAM) to locate target objects. On the one hand, the CAM generated by classification networks is sparse and can only focus on the most discriminative areas of objects. Some misactivated pixels are observed in the CAM, which may provide improper guidance for the subsequent segmentation task. On the other hand, the performance of the segmentation network depends on the quality of the pseudo labels. Thus, obtaining the accurate pseudo label also requires the shape and boundary of the object. However, this information cannot be directly and accurately obtained in image-level labels, and guaranteeing the quality of pseudo labels is difficult. A new saliency-guided weakly supervised semantic segmentation algorithm is proposed in this paper to improve the performance of the segmentation model to obtain complete CAMs.
Method
2
First, research shows that randomly hiding the target in the image can enhance the capability of the network to locate the complete target. However, part of the image information cannot be used when directly hiding the image at random. By contrast, the complementary hiding method can use all the image information. However, guaranteeing that the target object can be hidden as expected is difficult due to the randomness of the hiding method. Only the background area is randomly hidden in some cases. A saliency-guided object complementary hiding method is proposed in this paper. Through the foreground information provided by the saliency map, the complementary random hiding of the object in the image is performed to obtain the complementary image pairs. The CAMs of the complementary image pairs are then fused as supervision to improve the capability of the network to obtain complete CAMs. Second, the convolution operation in the classification network used to generate CAMs can lead to a local receptive field, which may cause some differences in the corresponding features of the same class objects with changes in scale, illumination, and viewing angle. These differences may result in intra-class inconsistency, negatively affecting the activation and leading to mis-activation in the CAM. In addition, the classification network itself has weak capability to extract complete objects, and achieving good effects in expanding the object area using the object complementary hiding method guided only by saliency is still difficult. Therefore, a dual attention refinement module is introduced to further correct the CAM by the global information, and the obtained CAM is used to generate the pseudo label to train the segmentation network. Prediction results of the segmentation network will have higher accuracy than the original pseudo labels. However, the prediction results also have some noise, which cannot guarantee the performance improvement of the segmentation model by directly using iterative training. Finally, this paper uses the label iteration refinement strategy, combines the initial prediction of the segmentation network, CAM, and saliency map to generate pseudo labels, and iteratively trains the segmentation network to further improve the performance of the segmentation network. Saliency maps can effectively distinguish between foreground and background objects but cannot identify the object categories. CAMs can accurately locate the object categories but lack information regarding the complete shape of the objects. Segmentation network prediction can provide relatively complete information regarding the object boundary but may contain misclassification pixels. The impact of pixel misclassification is markedly reduced by fully utilizing the information provided by the three types of maps to refine the pseudo labels.
Result
2
The experiment is divided into two parts to verify the effectiveness of the algorithm. In the first part, the proposed CAM generation algorithm in this paper is verified and compared with other methods. In the second part, the proposed method is compared with several classical weakly supervised semantic segmentation algorithms, and the effectiveness of the modules in the proposed model is analyzed by ablation experiment. The experiments are initially conducted on the PASCAL VOC 2012 dataset. By contrast, the CAM generated by this algorithm is more complete, and its mean intersection over union (mIoU) is improved by 10.21% compared with the baseline. The segmentation network produced better prediction results compared with the six methods, demonstrating a 6.9% improvement over the baseline. Thus, the proposed method outperforms the other methods in 13 categories. With an mIoU value of 92% in the background category, the proposed method achieved the highest performance among other methods, indicating its effective utilization of saliency maps in training. Multi-objective semantic segmentation experiment is also conducted on the COCO 2014 data set. Compared with PASCAL VOC 2012, this dataset has richer categories and contains a larger number of images with multiple object categories, indicating a high demand on the performance of the algorithm. Experimental results show that the value of mIoU is improved by 0.5% on COCO 2014.
Conclusion
2
This algorithm can obtain a complete CAM, effectively alleviate the problem of insufficient supervision information, and improve the accuracy of weakly supervised semantic segmentation models.
深度学习弱监督语义分割显著性引导类激活图(CAM)注意力机制
deep learningweakly supervised semantic segmentationsaliency guidanceclass activation map (CAM)attention mechanism
Ahn J and Kwak S. 2018. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4981-4990 [DOI: 10.1109/CVPR.2018.00523http://dx.doi.org/10.1109/CVPR.2018.00523]
Ahn J, Cho S and Kwak S. 2019. Weakly supervised learning of instance segmentation with inter-pixel relations//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2209-2218 [DOI: 10.1109/CVPR.2019.00231http://dx.doi.org/10.1109/CVPR.2019.00231]
Bearman A, Russakovsky O, Ferrari V and Li F F. 2016. What’s the point: semantic segmentation with point supervision//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 549-565 [DOI: 10.1007/978-3-319-46478-7_34http://dx.doi.org/10.1007/978-3-319-46478-7_34]
Chang Y T, Wang Q S, Hung W C, Piramuthu R, Tsai Y H and Yang M H. 2020. Weakly-supervised semantic segmentation via sub-category exploration//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8991-9000 [DOI: 10.1109/CVPR42600.2020.00901http://dx.doi.org/10.1109/CVPR42600.2020.00901]
Chen C, Tang S and Li J T. 2020. Weakly supervised semantic segmentation based on dynamic mask generation. Journal of Image and Graphics, 25(6): 1190-1200
陈辰, 唐胜, 李锦涛. 2020. 动态生成掩膜弱监督语义分割. 中国图象图形学报, 25(6): 1190-1200 [DOI: 10.11834/jig.190458http://dx.doi.org/10.11834/jig.190458]
Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2018. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4): 834-848 [DOI: 10.1109/TPAMI.2017.2699184http://dx.doi.org/10.1109/TPAMI.2017.2699184]
Chen Q, Yang L X, Lai J H and Xie X H. 2022a. Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 4278-4288 [DOI: 10.1109/CVPR52688.2022.00425http://dx.doi.org/10.1109/CVPR52688.2022.00425]
Chen T, Yao Y Z, Zhang L, Wang Q, Xie G S and Shen F M. 2022b. Saliency guided inter- and intra-class relation constraints for weakly supervised semantic segmentation. IEEE Transactions on Multimedia, 25: 1727-1737 [DOI: 10.1109/TMM.2022.3157481http://dx.doi.org/10.1109/TMM.2022.3157481]
Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2009. ImageNet: a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 248-255 [DOI: 10.1109/CVPR.2009.5206848http://dx.doi.org/10.1109/CVPR.2009.5206848]
Everingham M, Van Gool L, Williams C K I, Winn J and Zisserman A. 2012. Visual object classes challenge 2012 [DB/OL]. [2023-03-09]. http://host.robots.ox.ac.uk/pascal/VOC/voc2012/http://host.robots.ox.ac.uk/pascal/VOC/voc2012/
Felzenszwalb P F and Huttenlocher D P. 2004. Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2): 167-181 [DOI: 10.1023/B:VISI.0000022288.19776.77http://dx.doi.org/10.1023/B:VISI.0000022288.19776.77]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hou Q B, Jiang P T, Wei Y C and Cheng M M. 2018. Self-erasing network for integral object attention//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal Canada: Curran Associates Inc., 31: 549-559
Huang Z L, Wang X G, Wang J S, Liu W Y and Wang J D. 2018. Weakly-supervised semantic segmentation network with deep seeded region growing//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7014-7023 [DOI: 10.1109/CVPR.2018.00733http://dx.doi.org/10.1109/CVPR.2018.00733]
Jiang P T, Hou Q B, Cao Y, Cheng M M, Wei Y C and Xiong H K. 2019. Integral object mining via online attention accumulation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 2070-2079 [DOI: 10.1109/ICCV.2019.00216http://dx.doi.org/10.1109/ICCV.2019.00216]
Jiang P T, Yang Y Q, Hou Q B and Wei Y C. 2022. L2G: a simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 16865-16875 [DOI: 10.1109/CVPR52688.2022.01638http://dx.doi.org/10.1109/CVPR52688.2022.01638]
Jo S and Yu I J. 2021. Puzzle-CAM: improved localization via matching partial and full features//Proceedings of 2021 IEEE International Conference on Image Processing. Anchorage, USA: IEEE: 639-643 [DOI: 10.1109/ICIP42928.2021.9506058http://dx.doi.org/10.1109/ICIP42928.2021.9506058]
Kolesnikov A and Lampert C H. 2016. Seed, expand and constrain: three principles for weakly-supervised image segmentation//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 695-711 [DOI: 10.1007/978-3-319-46493-0_42http://dx.doi.org/10.1007/978-3-319-46493-0_42]
Lee J, Kim E, Lee S, Lee J and Yoon S. 2019. FickleNet: weakly and semi-supervised semantic image segmentation using stochastic inference//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5267-5276 [DOI: 10.1109/CVPR.2019.00541http://dx.doi.org/10.1109/CVPR.2019.00541]
Lee J, Kim E and Yoon S. 2021a. Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 4071-4080 [DOI: 10.1109/CVPR46437.2021.00406http://dx.doi.org/10.1109/CVPR46437.2021.00406]
Lee S, Lee M, Lee J and Shim H. 2021b. Railroad is not a train: saliency as pseudo-pixel supervision for weakly supervised semantic segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 5495-5505 [DOI: 10.1109/CVPR46437.2021.00545http://dx.doi.org/10.1109/CVPR46437.2021.00545]
Li Y, Kuang Z H, Liu L Y, Chen Y M and Zhang W. 2021. Pseudo-mask matters in weakly-supervised semantic segmentation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 6944-6953 [DOI: 10.1109/ICCV48922.2021.00688http://dx.doi.org/10.1109/ICCV48922.2021.00688]
Lin D, Dai J F, Jia J Y, He K M and Sun J. 2016. ScribbleSup: scribble-supervised convolutional networks for semantic segmentation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 3159-3167 [DOI: 10.1109/CVPR.2016.344http://dx.doi.org/10.1109/CVPR.2016.344]
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollr P and Zitnick C L. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 740-755 [DOI: 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48]
Liu J W, Zhang J, Hong Y C and Barnes N. 2021. Learning structure-aware semantic segmentation with image-level supervision//Proceedings of 2021 International Joint Conference on Neural Networks. Shenzhen, China: IEEE: 1-8 [DOI: 10.1109/IJCNN52387.2021.9533846http://dx.doi.org/10.1109/IJCNN52387.2021.9533846]
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3431-3440 [DOI: 10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965]
Papandreou G, Chen L C, Murphy K P and Yuille A L. 2015. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1742-1750 [DOI: 10.1109/ICCV.2015.203http://dx.doi.org/10.1109/ICCV.2015.203]
Pinheiro P O and Collobert R. 2015. From image-level to pixel-level labeling with convolutional networks//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1713-1721 [DOI: 10.1109/CVPR.2015.7298780http://dx.doi.org/10.1109/CVPR.2015.7298780]
Qing C, Yu J, Xiao C B and Duan J. 2020. Deep convolutional neural network for semantic image segmentation. Journal of Image and Graphics, 25(6): 1069-1090
青晨, 禹晶, 肖创柏, 段娟. 2020. 深度卷积神经网络图像语义分割研究进展. 中国图象图形学报, 25(6): 1069-1090 [DOI: 10.11834/jig.190355http://dx.doi.org/10.11834/jig.190355]
Shimoda W and Yanai K. 2019. Self-supervised difference detection for weakly-supervised semantic segmentation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 5208-5217 [DOI: 10.1109/ICCV.2019.00531http://dx.doi.org/10.1109/ICCV.2019.00531]
Singh K K and Lee Y J. 2017. Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 3544-3553 [DOI: 10.1109/ICCV.2017.381http://dx.doi.org/10.1109/ICCV.2017.381]
Wang X, You S D, Li X and Ma H M. 2018. Weakly-supervised semantic segmentation by iteratively mining common object features//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1354-1362 [DOI: 10.1109/CVPR.2018.00147http://dx.doi.org/10.1109/CVPR.2018.00147]
Wang Y D, Zhang J, Kan M N, Shan S G and Chen X L. 2020. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 12275-12284 [DOI: 10.1109/CVPR42600.2020.01229http://dx.doi.org/10.1109/CVPR42600.2020.01229]
Wei Y C, Feng J S, Liang X D, Cheng M M, Zhao Y and Yan S C. 2017. Object region mining with adversarial erasing: a simple classification to semantic segmentation approach//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE: 1568-1576 [DOI: 10.1109/CVPR.2017.687http://dx.doi.org/10.1109/CVPR.2017.687]
Wu Z F, Shen C H and van den Hengel A. 2019. Wider or deeper: revisiting the ResNet model for visual recognition. Pattern Recognition, 90: 119-133 [DOI: 10.1016/j.patcog.2019.01.006http://dx.doi.org/10.1016/j.patcog.2019.01.006]
Yao Y Z, Chen T, Xie G S, Zhang C Y, Shen F M, Wu Q, Tang Z M and Zhang J. 2021 Non-salient region object mining for weakly supervised semantic segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 2623-2632 [DOI: 10.1109/CVPR46437.2021.00265http://dx.doi.org/10.1109/CVPR46437.2021.00265]
Zhang B F, Xiao J M, Wei Y C, Sun M J and Huang K Z. 2020. Reliability does matter: an end-to-end weakly supervised semantic segmentation approach//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 12765-12772 [DOI: 10.1609/AAAI.v34i07.6971http://dx.doi.org/10.1609/AAAI.v34i07.6971]
Zhang F, Gu C C, Zhang C Y and Dai Y C. 2021. Complementary patch for weakly supervised semantic segmentation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 7242-7251 [DOI: 10.1109/ICCV48922.2021.00715http://dx.doi.org/10.1109/ICCV48922.2021.00715]
Zhou B L, Khosla A, Lapedriza A, Oliva A and Torralba A. 2016. Learning deep features for discriminative localization//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2921-2929 [DOI: 10.1109/CVPR.2016.319http://dx.doi.org/10.1109/CVPR.2016.319]
相关作者
相关机构