融合非局部特征表示的模糊图像复原
Nonlocal feature representation-embedded blurred image restoration
- 2024年29卷第10期 页码:3033-3046
纸质出版日期: 2024-10-16
DOI: 10.11834/jig.230735
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-10-16 ,
移动端阅览
华夏, 舒婷, 李明欣, 时愈, 洪汉玉. 2024. 融合非局部特征表示的模糊图像复原. 中国图象图形学报, 29(10):3033-3046
Hua Xia, Shu Ting, Li Mingxin, Shi Yu, Hong Hanyu. 2024. Nonlocal feature representation-embedded blurred image restoration. Journal of Image and Graphics, 29(10):3033-3046
目的
2
基于深度学习的端到端单图像去模糊方法已取得了优秀成果。但大多数网络中的构建块仅专注于提取局部特征,而在建模远距离像素依赖关系方面表现出局限性。为解决这一问题,提出了一种为网络引入局部特征和非局部特征的方法。
方法
2
采用现有的优秀构建块提取局部特征,将大窗口的Transformer块划分为更小的不重叠图像块,对每个图像块仅采样一个最大值点参与自注意力运算,在不占用过多计算资源的情况下提取非局部特征。最后将两个模块结合应用,在块内耦合局部信息和非局部信息,从而有效捕捉更丰富的特征信息。
结果
2
实验表明,相比于仅能提取局部信息的模块,提出的模块在峰值信噪比(peak signal-to-noise ratio, PSNR)指标上的提升不少于1.3 dB。此外,设计两个局部与非局部特征耦合的图像复原网络,分别运用在单图像去运动模糊和去散焦模糊任务上,与Uformer(a general U-shaped Transformer for image restoration)相比,在去运动模糊测试集GoPro(deep multi-scale convolutional neural network for dynamic scene deblurring)和HIDE(human-aware motion deblurring)上的平均PSNR分别提高了0.29 dB和0.25 dB,且模型的浮点数更低。在去散焦模糊测试集DPD(defocus deblurring using dual-pixel data)上,平均PSNR提高了0.42 dB。
结论
2
本文方法在块内成功引入非局部信息,使得模型能够同时捕捉局部特征和非局部特征,获得更多的特征表示,提升了去模糊网络的性能。同时,恢复图像也具有更清楚的边缘,更接近真实图像。
Objective
2
Image deblurring is a classic low-level computer vision problem that aims to restore a sharp image from a blurry image. In recent years, convolutional neural networks (CNNs) have boosted the advancement of computer vision considerably, and various CNN-based deblurring methods have been developed with remarkable results. Although convolution operation is powerful in capturing local information, the CNNs show a limitation in modeling long-range dependencies. By employing self-attention mechanisms, vision Transformers have shown a high ability to model long-range pixel relationships. However, most Transformer models designed for computer vision tasks involving high-resolution images use a local window self-attention mechanism. This is contradictory to the goal of employing Transformer structures to capture true long-range pixel dependencies. We review some deblurring models that are sufficient for processing high-resolution images; most CNN-based and vision Transformer-based approaches can only extract spatial local features. Some studies obtain the information with larger receptive field by directly increasing the window size, but this method not only has excessive computational overhead but also lacks flexibility in the process of feature extraction. To solve the above problems, we propose a method that can incorporate local and nonlocal information for the network.
Method
2
We employ the local feature representation (LFR) modules and nonlocal feature representation (NLFR) modules to extract enriched information. For the extraction of local information, most of the existing building blocks have this capability, and we can treat these blocks directly as LFR modules. In addition to obtaining local information, we also designed a generic NLFR module that can be easily combined with the LFR module for extracting nonlocal information. The NLFR module consists of a nonlocal feature extraction (NLFE) block and an interblock transmission (IBT) mechanism. The NLFE block applies a nonlocal self-attention mechanism, which avoids the interference of local information and texture details, captures purer nonlocal information, and considerably reduces the computational complexity. To reduce the effect of accumulating more local information in the NLFE block as the network depth increases, we introduce an IBT mechanism for successive NLFE blocks, which provides a direct data flow for the transfer of nonlocal information. This design has two advantages: 1) The NLFR module ignores local texture details in features when extracting information to ensure that information does not interfere with each other. 2) Instead of computing the self-similarity of all pixels within the receptive field, the NLFR module adaptively samples the salient pixels, considerably reducing computational complexity. We selected LeFF and ResBlock as the LFR module combined with the NLFR module and designed two models named NLCNet_L and NLCNet_R to deal with motion blur removal and defocus blur removal, respectively, based on the single-stage UNet as the model architecture.
Result
2
We verify the gains of each component of the NLFR module in the network; the network consisting of the NLFR module combined with the LFR module obtains peak signal-to-noise ratio(PSNR) gains of 0.89 dB compared with using only the LFR as the building block. Applying the IBT module over this, the performance is further improved by 0.09 dB on PSNR. For fair comparisons, we build a baseline model only using ResBlock as the building block with similar computational overhead and number of parameters to the proposed network. Results demonstrate that NLFR-combined ResBlock is more effective in constructing a deblurred network than directly using ResBlock as the building block. In scalability experiments, the experiment shows that the combination of NLFR modules with existing building blocks can remarkably improve the deblurring performance, including convolutional residual blocks and a Transformer block. In particular, two networks designed with NLFR-combination LeFF block and ResBlock as the building blocks achieve excellent results in single-image motion deblurring and dual-pixel defocus deblurring compared with other methods. In accordance with a popular training method, NLCNet_L was trained on the GoPro dataset with 3 000 epochs and tested on the GoPro test set. Our method achieves the best results on the GoPro test set with the lowest computational complexity. Compared with the previous method Uformer, our method improves PSNR by 0.29 dB. We trained NLCNet_R on the DPD dataset for 200 epochs for two-pixel defocus deblurring experiments. In the combined scene category, we achieved excellent performance in all four metrics. Compared with the previous method Uformer, our method improves the PSNR in indoor and outdoor scenes by 1.37 dB and 0.94 dB, respectively.
Conclusion
2
We propose a generic NLFR module to represent the extraction of real nonlocal information from images, which can be coupled with local information within the block to improve the expressive ability of the model. Through rational design, the network composed of NLFR modules achieves excellent performance with low computational consumption, and the visual effect of the recovered image, especially the edge contours, is clearer and more complete.
运动模糊散焦模糊自注意力非局部特征融合网络
motion blurdefocus blurself-attentionnon-local featuresfusion network
Abuolaim A and Brown M S. 2020. Defocus deblurring using dual-pixel data//Proceedings of the 16th European Conference on Computer Vision-ECCV 2020. Glasgow, UK: Springer: 111-126 [DOI: 10.1007/978-3-030-58607-2_7http://dx.doi.org/10.1007/978-3-030-58607-2_7]
Abuolaim A, Delbracio M, Kelly D, Brown M S and Milanfar P. 2021. Learning to reduce defocus blur by realistically modeling dual-pixel data//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE: 2269-2278 [DOI: 10.1109/ICCV48922.2021.00229http://dx.doi.org/10.1109/ICCV48922.2021.00229]
Chen H T, Wang Y H, Guo T Y, Xu C, Deng Y P, Liu Z H, Ma S W, Xu C J, Xu C and Gao W. 2021b. Pre-trained image processing Transformer//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 12294-12305 [DOI: 10.1109/CVPR46437.2021.01212http://dx.doi.org/10.1109/CVPR46437.2021.01212]
Chen L Y, Chu X J, Zhang X Y and Sun J. 2022. Simple baselines for image restoration//Proceedings of the 17th European Conference Computer Vision — ECCV 2022. Tel Aviv, Israel: Springer: 17-33 [DOI: 10.1007/978-3-031-20071-7_2http://dx.doi.org/10.1007/978-3-031-20071-7_2]
Chen L Y, Lu X, Zhang J, Chu X J and Chen C P. 2021a. HINet: half instance normalization network for image restoration//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Nashville, USA: IEEE: 182-192 [DOI: 10.1109/CVPRW53098.2021.00027http://dx.doi.org/10.1109/CVPRW53098.2021.00027]
Chen Z N, Zhang H Y, Zeng N Y and Li H. 2022. Attention mechanism embedded multi-scale restoration method for blurred image. Journal of Image and Graphics, 27(5): 1682-1696
陈紫柠, 张宏怡, 曾念寅, 李寒. 2022. 融合注意力机制的模糊图像多尺度复原. 中国图象图形学报, 27(5): 1682-1696 [DOI: 10.11834/jig.210249http://dx.doi.org/10.11834/jig.210249]
Cho S J, Ji S W, Hong J P, Jung S W and Ko S J. 2021. Rethinking coarse-to-fine approach in single image deblurring//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE: 4621-4630 [DOI: 10.1109/ICCV48922.2021.00460http://dx.doi.org/10.1109/ICCV48922.2021.00460]
Dong X Y, Bao J, M Chen D D, Zhang W M, Yu N H, Yuan L, Chen D and Guo B N. 2022. CSWin Transformer: a general vision Transformer backbone with cross-shaped windows//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE: 12114-12124 [DOI: 10.1109/CVPR52688.2022.01181http://dx.doi.org/10.1109/CVPR52688.2022.01181]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G and Gelly S. 2020. An image is worth 16 × 16 words: Transformers for image recognition at scale [EB/OL]. [2023-10-23]. https://arxiv.org/pdf/2010.11929.pdfhttps://arxiv.org/pdf/2010.11929.pdf
Gong D, Yang J, Liu L Q, Zhang Y N, Reid I, Shen C H, Van Den Hengel A and Shi Q F. 2017. From motion blur to motion flow: a deep learning solution for removing heterogeneous motion blur//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 3806-3815 [DOI: 10.1109/CVPR.2017.405http://dx.doi.org/10.1109/CVPR.2017.405]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Krishnan D, Tay T and Fergus R. 2011. Blind deconvolution using a normalized sparsity measure//Proceedings of 2011 CVPR. Colorado Springs, USA: IEEE: 233-240 [DOI: 10.1109/CVPR.2011.5995521http://dx.doi.org/10.1109/CVPR.2011.5995521]
Krizhevsky A, Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90 [DOI: 10.1145/3065386http://dx.doi.org/10.1145/3065386]
Lee J, Son H, Rim J, Cho S and Lee S. 2021. Iterative filter adaptive network for single image defocus deblurring//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 2034-2042 [DOI: 10.1109/CVPR46437.2021.00207http://dx.doi.org/10.1109/CVPR46437.2021.00207]
Li Y W, Zhang K, Cao J Z, Timofte R and Van Gool L. 2021. LocalVit: bringing locality to vision Transformers [EB/OL]. [2023-10-23]. https://arxiv.org/pdf/2104.05707.pdfhttps://arxiv.org/pdf/2104.05707.pdf
Liang J Y, Cao J Z, Sun G L, Zhang K, Van Gool L and Timofte R. 2021. SwinIR: image restoration using swin Transformer//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Montreal, Canada: IEEE: 1833-1844 [DOI: 10.1109/ICCVW54120.2021.00210http://dx.doi.org/10.1109/ICCVW54120.2021.00210]
Liang P W, Jiang J J, Liu X M and Ma J Y. 2022. BaMBNet: a blur-aware multi-branch network for dual-pixel defocus deblurring. IEEE/CAA Journal of Automatica Sinica, 9(5): 878-892 [DOI: 10.1109/JAS.2022.105563http://dx.doi.org/10.1109/JAS.2022.105563]
Liu Z, Lin Y T, Cao Y, Hu H, Wei Y X, Zhang Z, Lin S and Guo B N. 2021. Swin Transformer: hierarchical vision Transformer using shifted windows//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE: 9992-10002 [DOI: 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986]
Nah S, Kim T H and Lee K M. 2017. Deep multi-scale convolutional neural network for dynamic scene deblurring//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 257-265 [DOI: 10.1109/CVPR.2017.35http://dx.doi.org/10.1109/CVPR.2017.35]
Pan J S, Sun D Q, Pfister H and Yang M H. 2016. Blind image deblurring using dark channel prior//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 1628-1636 [DOI: 10.1109/CVPR.2016.180http://dx.doi.org/10.1109/CVPR.2016.180]
Purohit K, Suin M, Rajagopalan A N and Boddeti V N. 2021. Spatially-adaptive image restoration using distortion-guided networks//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE: 2289-2299 [DOI: 10.1109/ICCV48922.2021.00231http://dx.doi.org/10.1109/ICCV48922.2021.00231]
Ruan L Y, Chen B, Li J Z and Lam M. 2022. Learning to deblur using light field generated and real defocus images//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE: 16283-16292 [DOI: 10.1109/CVPR52688.2022.01582http://dx.doi.org/10.1109/CVPR52688.2022.01582]
Schuler C J, Hirsch M, Harmeling S and Schölkopf B. 2016. Learning to deblur. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(7): 1439-1451 [DOI: 10.1109/TPAMI.2015.2481418http://dx.doi.org/10.1109/TPAMI.2015.2481418]
Shen Z Y, Wang W G, Lu X K, Shen J B, Ling H B, Xu T F and Shao L. 2019. Human-aware motion deblurring//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea: IEEE: 5571-5580 [DOI: 10.1109/ICCV.2019.00567http://dx.doi.org/10.1109/ICCV.2019.00567]
Sun J, Cao W F, Xu Z B and Ponce J. 2015. Learning a convolutional neural network for non-uniform motion blur removal//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 769-777 [DOI: 10.1109/CVPR.2015.7298677http://dx.doi.org/10.1109/CVPR.2015.7298677]
Tsai F J, Peng Y T, Lin Y Y, Tsai C C and Lin C W. 2022a. Stripformer: strip Transformer for fast image deblurring//Proceedings of the 17th European Conference on Computer Vision — ECCV 2022. Tel Aviv, Israel: Springer: 146-162 [DOI: 10.1007/978-3-031-19800-7_9http://dx.doi.org/10.1007/978-3-031-19800-7_9]
Tsai F J, Peng Y T, Tsai C C, Lin Y Y and Lin C W. 2022b. BANet: a blur-aware attention network for dynamic scene deblurring. IEEE Transactions on Image Processing, 31: 6789-6799 [DOI: 10.1109/TIP.2022.3216216http://dx.doi.org/10.1109/TIP.2022.3216216]
Tu Z Z, Talebi H, Zhang H, Yang F, Milanfar P, Bovik A and Li Y X. 2022. MAXIM: multi-axis MLP for image processing//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE: 5759-5770 [DOI: 10.1109/CVPR52688.2022.00568http://dx.doi.org/10.1109/CVPR52688.2022.00568]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 6000-6010 [DOI: 10.5555/3295222.3295349http://dx.doi.org/10.5555/3295222.3295349]
Wang W H, Xie E Z, Li X, Fan D P, Song K T, Liang D, Lu T, Luo P and Shao L. 2021. Pyramid vision Transformer: a versatile backbone for dense prediction without convolutions//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE: 548-558 [DOI: 10.1109/ICCV48922.2021.00061http://dx.doi.org/10.1109/ICCV48922.2021.00061]
Wang Z D, Cun X D, Bao J M, Zhou W G, Liu J Z and Li H Q. 2022. Uformer: a general U-shaped Transformer for image restoration//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE: 17662-17672 [DOI: 10.1109/CVPR52688.2022.01716http://dx.doi.org/10.1109/CVPR52688.2022.01716]
Xu L, Zheng S C and Jia J Y. 2013. Unnatural L0 sparse representation for natural image deblurring//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 1107-1114 [DOI: 10.1109/CVPR.2013.147http://dx.doi.org/10.1109/CVPR.2013.147]
Yuan K, Guo S P, Liu Z W, Zhou A J, Yu F W and Wu W. 2021. Incorporating convolution designs into visual Transformers//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE: 559-568 [DOI: 10.1109/ICCV48922.2021.00062http://dx.doi.org/10.1109/ICCV48922.2021.00062]
Zamir S W, Arora A, Khan S, Hayat M, Khan F S and Yang M H. 2022. Restormer: efficient Transformer for high-resolution image restoration//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE: 5718-5729 [DOI: 10.1109/CVPR52688.2022.00564http://dx.doi.org/10.1109/CVPR52688.2022.00564]
Zamir S W, Arora A, Khan S, Hayat M, Khan F S, Yang M H and Shao L. 2021. Multi-stage progressive image restoration//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 14816-14826 [DOI: 10.1109/CVPR46437.2021.01458http://dx.doi.org/10.1109/CVPR46437.2021.01458]
Zamir S W, Arora A, Khan S, Hayat M, Khan F S, Yang M H and Shao L. 2023. Learning enriched features for fast image restoration and enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2): 1934-1948 [DOI: 10.1109/TPAMI.2022.3167175http://dx.doi.org/10.1109/TPAMI.2022.3167175]
Zhang H G, Dai Y C, Li H D and Koniusz P. 2019. Deep stacked hierarchical multi-patch network for image deblurring//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 5971-5979 [DOI: 10.1109/CVPR.2019.00613http://dx.doi.org/10.1109/CVPR.2019.00613]
Zhang K H, Luo W H, Zhong Y R, Ma L, Stenger B, Liu W and Li H D. 2020. Deblurring by realistic blurring//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 2734-2743 [DOI: 10.1109/CVPR42600.2020.00281http://dx.doi.org/10.1109/CVPR42600.2020.00281]
相关作者
相关机构