光场角度线索表征的语义分割研究
Semantic Segmentation of Light Field Angle Cues Representation
- 2024年 页码:1-11
网络出版日期: 2024-12-23
DOI: 10.11834/jig.240391
移动端阅览
浏览全部资源
扫码关注微信
网络出版日期: 2024-12-23 ,
移动端阅览
程欣怡,贾晨,张梓轩等.光场角度线索表征的语义分割研究[J].中国图象图形学报,
Cheng Xinyi,Jia Chen,Zhang Zixuan,et al.Semantic Segmentation of Light Field Angle Cues Representation[J].Journal of Image and Graphics,
目的
2
当前的光场语义分割方法局限于单一物体、手工特征表达鲁棒性差且缺乏高层角度语义信息,针对上述不足,文中提出了一种适用于静态图像的端到端语义分割网络,充分挖掘了深度卷积神经网络对光场图像特征的表征潜力,探索了空间和角度结构关系以解决过分割和欠分割问题。
方法
2
从多尺度光场宏像素图像构造出发,基于多种骨干网络设计,提出了一个高效角度特征提取器(angular feature extractor, AFE)与空洞空间金字塔池化(atrous spatial pyramid pooling, ASPP)结合的光场语义分割模型。其中,在编码器模块中采用ASPP用于高效地提取并融合宏像素图像中的多尺度空间特征,提高模型对复杂场景的适应能力;在解码器中设计AFE用于提取宏像素图像中的角度结构线索,减少特征在连续下采样过程中存在的角度信息丢失。
结果
2
通过在LF Dataset开源数据集上与最新的7种光场最佳方法(state-of-the-art, SOTA)进行实验,利用ResNet101作为骨干网络时所提模型在测试集上实现了88.80%的平均交并比(mean intersection over union, mIoU),在所有对比方法中是最佳的。
结论
2
文中所提出的模型在提升语义分割性能方面具有可行性和有效性,能够更加精确地捕捉到图像中细微变化的信息,实现更精确的边界分割,为光场技术在场景理解中的应用提供了新的研究方向。
Objective Light field images are high-dimensional data capturing multi-view information of scenes, encompassing rich geometric and angular details. In light field semantic segmentation, the goal is to assign semantic labels to each pixel in the light field image, distinguishing different objects or parts of objects. Traditional 2D or 3D image segmentation methods often struggle with challenges such as variations in illumination, shadows, and occlusions when applied to light field images, leading to reduced segmentation accuracy and poor robustness. Leveraging angular and geometric information inherent in light field images, light field semantic segmentation aims to overcome these challenges and improve segmentation performance. However, existing algorithms are typically designed for RGB (red green blue) or RGB-depth image inputs and do not effectively utilize the structural information of light fields for semantic segmentation. Moreover, previous studies mainly focus on handling redundant light field data or manually crafted features, and the highly coupled four-dimensional nature of light field data poses a barrier to conventional CNN (convolutional neural network) modeling approaches. Additionally, prior works have primarily focused on object localization and segmentation in planar spatial positions, lacking detailed angular semantic information for each object. Therefore, we propose a CNN-based light field semantic segmentation network for processing macro-pixel light field images. Our approach incorporates AFE (angular feature extractor) to learn angular variations between different views within the light field image and employs dilated convolution operations to enhance semantic correlations across multiple channels.
Method
2
The article proposes an end-to-end semantic segmentation network for static images, starting from the construction of multiscale light field macro-pixel images. It is based on various backbone networks and dilated convolutions. Efficient extraction of spatial features from macro-pixel images poses a challenge, addressed by employing ASPP (atrous spatial pyramid pooling) in the encoder module to extract high-level fused semantic features. In experiments, the dilation rates for the ASPP module are selected as r = [12, 24, 36], aiming to enrich spatial features under the same-sized feature maps and achieve better semantic segmentation results. The use of different dilation rates in convolutions efficiently extracts multiscale spatial features. In the decoder module, feature modeling is performed to enhance nonlinear expression of low-level semantic features in macro-pixel images and channel correlation representation. Semantic features from the encoder are upsampled four times and concatenated with features generated by the angle model to enhance interactivity between features in the network. Further refinement of these features is achieved through 3×3 convolution operations, combining angle and spatial features for enhanced feature expression. Finally, segmentation results are outputted through fourfold upsampling. To enhance the expression of light field features and fully extract rich angular features from light field macro-pixel images, an AFE is introduced in the decoder stage. AFE operates as a special convolution with kernel size K×K, stride K, and dilation rate 1, where K equals the angular resolution of the light field. Input features for the angle model are derived from the Conv2_x layer of ResNet-101, preserving complete low-dimensional macro-pixel image features. This design is crucial for capturing angular relationships between pixels in sub-aperture images and avoids loss of angular information during consecutive downsampling operations. Incorporation of angular features enables the model to better distinguish boundaries between different categories and provides more accurate segmentation results. In complex scenarios such as uneven illumination, occlusion, or small objects, ASPP can extract a broader context, while AFE can capture complementary angular information between macro-pixel images. Their synergistic effect significantly enhances the performance of semantic segmentation tasks.
Result
2
To assess the performance of the proposed model, quantitative and qualitative comparison experiments were conducted on the LFLF Dataset against various optical flow methods mentioned in the paper. To ensure fair comparison, baseline parameters introduced in the article were used as benchmarks. Compared to the SOTA (state-of-the-art) methods, the model achieved 88.80% segmentation accuracy on the test set, outperforming all comparative methods. Furthermore, compared to all other baseline methods, the proposed approach achieved a performance improvement of over 2.15%, enabling more precise capture of subtle changes in images and thus achieving more accurate segmentation boundaries. Compared to five other semantic segmentation methods, this approach demonstrated significant superiority in segmentation boundary accuracy. Meanwhile, a series of relevant ablation experiments were conducted in this study to investigate the advantages of the AFE and multi-scale ASPP. Removing ASPP and AFE resulted in a significant decrease in mIoU (mean intersection over union) to 22.51%, a substantial drop of 66.29%, demonstrating that the complete model integrating ASPP and AFE effectively utilizes multi-scale information and angular features to achieve optimal semantic segmentation performance. Specifically, removing multi-scale ASPP led to a performance decrease of 6.58% due to the lack of supplementary multi-scale semantic features achievable only at a single scale. Similarly, removing AFE caused a performance drop of 2.36% due to the absence of guided angular clue features necessary for capturing specific optical flow information. Therefore, it can be conclusively inferred that the synergistic effect of the AFE and multi-scale ASPP significantly enhances the performance of semantic segmentation tasks. Four popular backbone networks, including ResNet101, DRN, MobileNet, and Xception, were experimented with to explore the optimal backbone network for the proposed algorithm. When the backbone network was ResNet101, the highest segmentation metric mIoU obtained was 88.80%.
Conclusion
2
This limitation arises from their inability to utilize angular information from light field images, thereby hindering accurate delineation of object boundaries. The proposed approach in this study demonstrates superior performance in overall image segmentation tasks, effectively mitigating issues of over-segmentation and missegmentation. Specifically, by introducing AFE and ASPP, the method proposed in the paper can more accurately capture subtle changes in images, thereby achieving more precise segmentation boundaries. Compared to five other semantic segmentation methods, the approach described in the paper demonstrates significant advantages in the accuracy of segmentation boundaries. The paper introduces a novel light field image semantic segmentation method that takes macro-pixel light field images as input, achieving end-to-end semantic segmentation. To extract angular features of the light field and enhance non-linearity in macro-pixel image features, a simple and efficient angular feature extraction model is designed and integrated into the network. Furthermore, the proposed model is evaluated against SOTA algorithms. Due to its efficient network architecture capable of capturing rich structural cues of light fields, the model achieves the highest mIoU score of 88.80% in semantic segmentation tasks. Experimental results demonstrate the feasibility and effectiveness of the proposed model in enhancing semantic segmentation performance, offering new research directions for the application of light field technology in scene understanding.
语义分割光场成像宏像素图像角度线索空洞卷积
semantic segmentationlight field imagingmacro-pixel imageangle cuesatrous convolution
Berent J and Dragotti P L. 2005. Segmentation of Epipolar-Plane Image Volumes with Occlusion and DisocclusionCompetition. IEEE Workshop on Multimedia Signal Proce-ssing.Victoria, Canada: IEEE: 182-185 [DOI:10.1109/MMSP.2006.285293http://dx.doi.org/10.1109/MMSP.2006.285293]
Campbell N D, Vogiatzis G, Hernández C and Cipolla R. 2007. Automatic 3D object segmentation in multiple viewsusi-ng volumetric graph-cuts. Image Vis. Comput, 28, 14-25 [DOI:10.1016/j.imavis.2008.09.005http://dx.doi.org/10.1016/j.imavis.2008.09.005]
Chen C Y, Fan X H, Hu X J and Yu H Y. 2023. Light-field ang-ular super-resolution reconstruction via fusing 3D epipolarplane images. Optics and Precision Engineering, 31(21), 3167-3177
陈纯毅, 范晓辉, 胡小娟, 于海洋. 2023. 融合3D对极平面图像的光场角度超分辨重建. 光学精密工程, 31(21): 3167-3177 [DOI:10.37188/ope.20233121.3167http://dx.doi.org/10.37188/ope.20233121.3167]
Chen L, Papandreou G, Kokkinos I, Murphy K P and YuilleA L. 2016. DeepLab: Semantic Image Segmentation withDeep Convolutional Nets, Atrous Convolution, and FullyConnected CRFs. IEEE Transactions on Pattern Analysisand Machine Intelligence, 40, 834-848 [DOI:10.1109/TPAMI.2017.2699184http://dx.doi.org/10.1109/TPAMI.2017.2699184]
Chen L, Papandreou G, Schroff F and Adam H. 2017. Rethin-king Atrous Convolution for Semantic Image Segmentation.[EB/OL].[2017-06-17]. https://arxiv.org/pdf/1706.05587v2.pdfhttps://arxiv.org/pdf/1706.05587v2.pdf
Chen X, Dai F, Ma Y and Zhang Y. 2015. Automatic foregro-und segmentation using light field images. 2015 VisualCommunications and Image Processing (VCIP). Singapore:IEEE: 1-4 [DOI:10.1109/VCIP.2015.7457895http://dx.doi.org/10.1109/VCIP.2015.7457895]
Chollet F. 2016. Xception: Deep Learning with Depthwise Se-parable Convolutions. IEEE Conference on Computer Visi-on and Pattern Recognition (CVPR). Hawaii, USA:IEEE:1800-1807 [DOI:10.1109/CVPR.2017.195http://dx.doi.org/10.1109/CVPR.2017.195]
Fu J, Liu J, Tian H, Fang Z and Lu H. 2018. Dual Attention Network for Scene Segmentation. 2019 IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 3141-3149 [DOI:10.1109/CVPR.2019.00326http://dx.doi.org/10.1109/CVPR.2019.00326]
He K, Zhang X, Ren S and Sun J. 2016. Deep Residual Learningfor Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE:[DOI:10.1109/cvpr.2016.90http://dx.doi.org/10.1109/cvpr.2016.90]
Hog M, Sabater N and Guillemot C M. 2016. Light FieldSegmentation Using a Ray-Based Graph Structure.EuropeanConference on Computer Vision (ECCV). Amsterdam, Neth-erlands: IEEE: 35-50 [DOI:10.1007/978-3-319-46478-7_3http://dx.doi.org/10.1007/978-3-319-46478-7_3]
Howard A G, Zhu M, Chen B, Kalenichenko D, Wang W,Weyand T, Andreetto M and Adam H. 2017. MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications[EB/OL].[2017-04-17]. https://arxiv.org/pdf/1704.04861.pdfhttps://arxiv.org/pdf/1704.04861.pdf
Jia C, Shi F, Zhao M, Zhang Y, Cheng X, Wang M andChen S. 2021. Semantic Segmentation With Light Field Imaging and Convolutional Neural Networks. IEEE Transactions on Instrumentation and Measurement, 70, 1-14 [DOI:10.1109/TIM.2021.3115204http://dx.doi.org/10.1109/TIM.2021.3115204]
Jia C, Shi F, Zhao M and Chen S. 2022. Light field imaging forcomputer vision: a survey.Frontiers of Information Technology & Electronic Engineering, 23, 1077-1097 [DOI:10.1631/FITEE.2100180http://dx.doi.org/10.1631/FITEE.2100180]
Liu B B and Hua B. 2019. Semi-supervised Image SemanticSegmentation Based on Encoder-Decoder. Computer Syste-ms Applications, 28(11): 182-187
刘贝贝, 华蓓. 2019. 基于编码器-解码器的半监督图像语义分割. 计算机系统应用, 28(11): 182-187 [DOI:10.15888/j.cnki.csa.007159http://dx.doi.org/10.15888/j.cnki.csa.007159]
Lu J W, Shi J L, Zhu H W, Sun Y H and Cheng Z G. 2024.Depth Guidance Unsupervised Domain Adaptation for Semant-ic Segmentation. Journal of Computer-Aided Design & Comp-uter Graphics, 36(1): 133-141
卢加文, 史金龙, 诸皓伟, 孙蕴瀚, 成志刚. 2024, 深度指导的无监督领域自适应语义分割. 计算机辅助设计与图形学学报, 36(1): 133-141 [DOI:10.3724/SP.J.1089.2024.19824http://dx.doi.org/10.3724/SP.J.1089.2024.19824]
Matysiak P, Grogan M, Aenchbacher W and Smolic A. 2020.Soft Colour Segmentation On Light Fields. 2020 IEEE In-ternational Conference on Image Processing (ICIP). AbuDhabi, United Arab Emirates: IEEE: 2621-2625 [DOI:10.1109/ICIP40778.2020.9190716http://dx.doi.org/10.1109/ICIP40778.2020.9190716]
Mihara H, Funatomi T, Tanaka K, Kubo H, Mukaigawa Y andNagahara H. 2016. 4D light field segmentation withspatial and angular consistencies. IEEE International Conf-erence on Computational Photography (ICCP). Evanston,USA: IEEE: 1-8 [DOI:10.1109/ICCPHOT.2016.7492872http://dx.doi.org/10.1109/ICCPHOT.2016.7492872]
Shelhamer E, Long J and Darrell T. 2017. Fully ConvolutionalNetworks for Semantic Segmentation. IEEE Transactionson Pattern Analysis and Machine Intelligence, 39, 640-651[DOI:10.1109/TPAMI.2016.2572683http://dx.doi.org/10.1109/TPAMI.2016.2572683]
Sheng H, Cong R, Yang D, Chen R, Wang S and Cui Z. 2022.UrbanLF: A Comprehensive Light Field Dataset for SemanticSegmentation of Urban Scenes. IEEE Transactions on Circuitsand Systems for Video Technology, 32, 7880-7893 [DOI:10.1109/TCSVT.2022.3187664http://dx.doi.org/10.1109/TCSVT.2022.3187664]
Simonyan K and Zisserman A. 2014. Very Deep ConvolutionalNetworks for Large-Scale Image Recognition [EB/OL].[2014-09-04]. https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf
Teng F, Zhang J, Peng K, Yang K, Wang Y and Stiefelhagen R.2023. OAFuser: Towards Omni-Aperture Fusion for Light Fie-ld Semantic Segmentation of Road Scenes [EB/OL].[2023-07-28]. https://arxiv.org/pdf/2307.15588.pdfhttps://arxiv.org/pdf/2307.15588.pdf
Wang X Y, Ma J Y, Gao W J and Jiang J J. 2021. MPIN: Macro-Pixel Aggregation Based Light Field Image Super-ResolutionNetwork. Frontiers of Information Technology & ElectronicEngineering, 22(10): 1299-1311.
王歆雅, 马佳义, 高文静, 江俊君. 2021. MPIN:基于宏像素聚合的光场图像超分辨率网络(英文).Frontiers of Information Technology & Electronic Engineering, 22(10):1299-1310[DOI:10.1631/FITEE.2000566http://dx.doi.org/10.1631/FITEE.2000566]
Wang Z, Qu S J. 2024. Research progress and challenges in real time semantic segmentation for deep learning. Journal of Ima-ge and Graphics,29(05):1188-1220
王卓, 瞿绍军. 2024. 深度学习实时语义分割研究进展和挑战. 中国图象图形学报, 29(05):1188-1220 [DOI: 10.11834/jig.230605http://dx.doi.org/10.11834/jig.230605]
Wanner S, Straehle C N and Goldlücke B. 2013. Globally Co-nsistent Multi-label Assignment on the Ray Space of 4DLight Fields. 2013 IEEE Conference on Computer Visionand Pattern Recognition (CVPR). Portland, USA: IEEE: 1011-1018 [DOI:10.1109/CVPR.2013.135http://dx.doi.org/10.1109/CVPR.2013.135]
Wei J F, Wang H L, Du X and Wang S J. 2023. Imaging Quality Pred-iction and Measurement for Optical Microlens Array. Acta Op-tica Sinica, 43(04): 11
卫劲锋, 王海龙, 杜雪, 王素娟.2023. 光学微透镜阵列成像质量预测和测量. 光学学报, 43(04): 11 [DOI:10.3788/AOS221605http://dx.doi.org/10.3788/AOS221605]
Xu Y, Nagahara H, Shimada A and Taniguchi R. 2019. TransCut2:Transparent Object Segmentation From a Light-Field Image.IEEE Transactions on Computational Imaging, 5, 465-477 [DOI:10.1109/TCI.2019.2893820http://dx.doi.org/10.1109/TCI.2019.2893820]
Yan Y, Deng C, Li L, Zhu L k, Ye B. 2023. Survey of imagesemantic segmentation methods in the deep learning era. Jour-nal of Image and Graphics, 28(11):3342-3362
严毅, 邓超, 李琳, 朱凌坤, 叶彪. 2023. 深度学习背景下的图像语义分割方法综述. 中国图象图形学报, 28(11):3342-3362 [DOI:10.11834/jig.220292http://dx.doi.org/10.11834/jig.220292]
Yang C, Guo H and Yang Z. 2022. A Method of Image Sem-antic Segmentation Based on PSPNet. Mathematical Probl-ems in Engineering. Hindawi, vol. 2022, pages 1-9,August.[DOI: 10.1155/2022/8958154http://dx.doi.org/10.1155/2022/8958154]
Yang D, Zhu T, Wang S, Wang S and Xiong Z. 2022. LFRSNet: A robust light field semantic segmentation networkcombining contextual and geometric features. Frontiers inEnvironmental Science.[DOI:10.3389/fenvs.2022.996513http://dx.doi.org/10.3389/fenvs.2022.996513]
Yang M, Yu K, Zhang C, Li Z and Yang K. 2018.DenseASPPfor Semantic Segmentation in Street Scenes. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recogn-ition (CVPR). Salt Lake City, USA: IEEE 3684-3692 [DOI:10.1109/CVPR.2018.00388http://dx.doi.org/10.1109/CVPR.2018.00388]
Yu F and Koltun V. 2016. Multi-Scale Context Aggregation byDilated Convolutions. ICLR[EB/OL].[2015-11-23]. https://arxiv.org/pdf/1511.07122.pdfhttps://arxiv.org/pdf/1511.07122.pdf
Yuan Y H, Chen X L and Wang J D. 2020. Object-contextualRepresentations for semantic segmentation. European Conferen-ce on Computer Vision (ECCV). Berlin, UK: IEEE 173–190[DOI:10.1007/978-3-030-58539-6_11http://dx.doi.org/10.1007/978-3-030-58539-6_11]
Zhao W, Ning Y, Jia X, Chai D, Su F and Wang S. 2024. ARapid Segmentation Method of Highway Surface Point CloudDataBased on a Supervoxel and Improved Region GrowingAlgorithm. Applied Sciences, 14(7):2852 [DOI:10.3390/app14072852http://dx.doi.org/10.3390/app14072852]
相关作者
相关机构