轻量级图像超分辨率的蓝图可分离卷积Transformer网络

毕修平; 陈实; 张乐飞

doi:10.11834/jig.230225

低质图像处理与语义理解 | 浏览量 : 0 下载量: 11 CSCD: 0

PDF
导出
分享
收藏
专辑

轻量级图像超分辨率的蓝图可分离卷积Transformer网络
Blueprint separable convolution Transformer network for lightweight image super-resolution
2024年29卷第4期页码：875-889
纸质出版日期： 2024-04-16 ，
DOI： 10.11834/jig.230225
稿件说明：

移动端阅览

毕修平，陈实，张乐飞. 2024. 轻量级图像超分辨率的蓝图可分离卷积Transformer网络. 中国图象图形学报， 29(04):0875-0889

Bi Xiuping， Chen Shi， Zhang Lefei. 2024. Blueprint separable convolution Transformer network for lightweight image super-resolution. Journal of Image and Graphics， 29(04):0875-0889
毕修平，陈实，张乐飞. 2024. 轻量级图像超分辨率的蓝图可分离卷积Transformer网络. 中国图象图形学报， 29(04):0875-0889 DOI： 10.11834/jig.230225.

Bi Xiuping， Chen Shi， Zhang Lefei. 2024. Blueprint separable convolution Transformer network for lightweight image super-resolution. Journal of Image and Graphics， 29(04):0875-0889 DOI： 10.11834/jig.230225.

摘要

目的

图像超分辨率重建的目的是将低分辨率图像复原出具有更丰富细节信息的高分辨率图像。近年来，基于Transformer的深度神经网络在图像超分辨率重建领域取得了令人瞩目的性能，然而，这些网络往往参数量巨大、计算成本较高。针对该问题，设计了一种轻量级图像超分辨率重建网络。

方法

提出了一种轻量级图像超分辨率的蓝图可分离卷积Transformer网络（blueprint separable convolution Transformer network，BSTN）。基于蓝图可分离卷积（blueprint separable convolution，BSConv）设计了蓝图前馈神经网络和蓝图多头自注意力模块。然后设计了移动通道注意力模块（shift channel attention block，SCAB）对通道重点信息进行加强，包括移动卷积、对比度感知通道注意力和蓝图前馈神经网络。最后设计了蓝图多头自注意力模块（blueprint multi-head self-attention block，BMSAB），通过蓝图多头自注意力与蓝图前馈神经网络以较低的计算量实现了自注意力过程。

结果

本文方法在4个数据集上与10种先进的轻量级超分辨率方法进行比较。客观上，本文方法在不同数据集上取得了不同程度的领先，并且参数量和浮点运算量都处于较低水平。当放大倍数分别为2、3和4时，在Set5数据集上相比SOTA（state-of-the-art）方法，峰值信噪比（peak signal to noise ratio，PSNR）分别提升了0.11 dB、0.16 dB和0.17 dB。主观上，本文方法重建图像清晰，模糊区域小，具有丰富的细节。

结论

本文所提出的蓝图可分离卷积Transformer网络BSTN以较少的参数量和浮点运算量达到了先进水平，能获得高质量的超分辨率重建结果。

Abstract

Objective

Image super-resolution aims to enhance the resolution and quality of low-resolution images， making them more visually appealing and suitable for human or machine recognition. By utilizing a series of degraded low-resolution images with coarse details， the objective is to reconstruct high-resolution images with finer details. The applications of super-resolution algorithms are vast and encompass areas， such as object detection， medical pathological analysis， remote sensing satellite images， and security monitoring. The promising prospects of these applications have led to an increased recognition of the importance of image super-resolution algorithms among researchers. With the advancement of deep learning in computer vision， this method has been successfully applied to image super-resolution， leading to significant achievements. However， the substantial number of parameters and the computational requirements of super-resolution models result in slow running speeds， limiting their practicality in real-world development and generation， particularly in mobile and edge devices. To address this issue， several lightweight super-resolution models have been proposed. Among these models， the Transformer-based approach stands out because it provides rich detail information in reconstructed images. However， this type of model still suffers from computational redundancy and large model size. To overcome these challenges， this study presents a novel lightweight super-resolution network based on the Transformer architecture.

Method

A blueprint separable convolution Transformer network （BSTN） is proposed for lightweight image super-resolution. BSTN is divided into three parts： shallow feature extraction， deep feature extraction， and image reconstruction. In the shallow feature extraction stage， a 3 × 3 standard convolution operation is employed to extract low-level features from the input image. This initial feature extraction step helps capture basic image information， which is directly transmitted to the tail of the network to provide residual information via the long skip connection. The deep feature extraction component is composed of four successive residual attention Transformer groups （RATGs）. The key elements within this stage are the shift channel attention module （SCAB） and the blueprint multi-head self-attention block （BMSAB）. SCAB and BMSAB are combined to form the hybrid attention Transformer module （HATB）. Two HATBs are connected together with a residual connection， and a standard convolution operation is applied to follow the two HATBs to construct the RATG. The blueprint feed-forward neural network is first designed for effectively suppressing low-information features and retaining only relevant and useful information. Then， the blueprint feed-forward neural network is introduced into the two aforementioned attention modules to efficiently extract the significant deep features for super-resolution. SCAB consists of three major components： shift convolution， contrast-aware channel attention， and blueprint feed-forward neural networks. Shift convolution reduces the number of network parameters and performs spatial information aggregation， enabling effective information fusion across different regions of the image. The contrast-aware channel attention mechanism focuses on important channel information， enhancing the representation of crucial features. BMSAB consists of a blueprint multi-head self-attention and a blueprint feed-forward neural network. This module allows for the extraction of self-attention with reduced computational complexity while suppressing low-information features through the blueprint feed-forward neural network. Finally， the shallow features extracted in the earlier stage and the deep features obtained from the RATGs are added together. The combined features are then processed using pixel shuffle， a technique that rearranges features to increase their spatial resolution. This final step generates the reconstructed high-resolution image with improved quality and detail. By utilizing the designed architecture and specific components， the proposed lightweight super-resolution network achieves effective feature extraction， self-attention calculation， and image reconstruction， addressing the challenges of parameter redundancy and large model size commonly encountered in Transformer-based super-resolution models. Our method is implemented using PyTorch on NVIDIA RTX 3090 GPU. The training datasets used in this study are DIV2K and Flicr2K， which consist of 800 and 1 000 images， respectively. Batch size is set to 32， and the patch size of the training data is set to 48 × 48 pixels. The initial learning rate is set to 5×10

-4

and updated with an Adam optimizer by using a cosine descent strategy， while the total iteration is 10

Result

The proposed method is compared with 11 state-of-the-art approaches on 4 datasets. In accordance with the quantitative results， the proposed method has achieved varying degrees of improvement in different magnifications and datasets， while parameter size and floating-point operations are at low levels. When the magnification factor is 2， the peak signal to noise ratio（PSNR） of this model is ranked first place on Set5， Set14， BSD100， and Urban100. It performs well on Set5 and Set14， surpassing the second-best model by 0.11 dB and 0.08 dB， respectively. When the magnification factor is 3， the PSNR also ranks first place， surpassing Set5 and Urban100 by 0.16 dB and 0.06 dB， respectively. When magnification is 4， it still ranks first place and outperforms the second-place models by 0.17， 0.05， and 0.04 dB on Set5， BSD100， and Urban100， respectively. In accordance with the qualitative results， the reconstructed image of the proposed method is clear， the blurred area is small， and details are rich.

Conclusion

A large number of comparative experiments and ablation studies demonstrate that the proposed EBST not only achieves state-of-the-art super-resolution results with excellent quantitative and visual performance， but it also has fewer parameters and floating-point operations. In particular， the proposed blueprint separable multi-head self-attention can effectively perform self-attention in Transformer blocks through a concise structure. The proposed blueprint feed-forward neural network can focus on helpful information and filter out useless information for super-resolution， resulting in high efficiency and low cost. It can be seamlessly integrated into other modules. Although our method performs well， its advantages in terms of lightweight models are in evident and should be further enhanced.

关键词

图像超分辨率轻量级模型Transformer深度学习注意力机制

Keywords

image super-resolutionlightweight modelTransformerdeep learningattention mechanism

references

Ahn N， Kang B and Sohn K A. 2018. Fast， accurate， and lightweight super-resolution with cascading residual network//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 256-272 ［DOI： 10.1007/978-3-030-01249-6_16http://dx.doi.org/10.1007/978-3-030-01249-6_16］

Choi H， Lee J and Yang J. 2023. N-Gram in swin Transformers for efficient lightweight image super-resolution//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver， Canada： IEEE： 2071-2081 ［DOI： 10.1109/CVPR52729.2023.00206http://dx.doi.org/10.1109/CVPR52729.2023.00206］

Dong C， Loy C C， He K M and Tang X O. 2014. Learning a deep convolutional network for image super-resolution//Proceedings of the 13th European Conference on Computer Vision. Zurich， Switzerland： IEEE： 184-199 ［DOI： 10.1007/978-3-319-10593-2_13http://dx.doi.org/10.1007/978-3-319-10593-2_13］

Gao G W， Wang Z X， Li J C， Li W J， Yu Y and Zeng T Y. 2022. Lightweight bimodal network for single-image super-resolution via symmetric CNN and recursive Transformer//Proceedings of the 31st International Joint Conference on Artificial Intelligence. Vienna， Austria： IJCAI.org： 913-919 ［DOI： 10.24963/ijcai.2022/128http://dx.doi.org/10.24963/ijcai.2022/128］

Gao Y， Liu Z， Qin P L and Wang L F. 2018. Medical image super-resolution algorithm based on deep residual generative adversarial network. Journal of Computer Applications， 38（9）： 2689-2695

高媛，刘志，秦品乐，王丽芳. 2018. 基于深度残差生成对抗网络的医学影像超分辨率算法. 计算机应用， 38（9）： 2689-2695 ［DOI： 10.11772/j.issn.1001-9081.2018030574http://dx.doi.org/10.11772/j.issn.1001-9081.2018030574］

Girshick R， Donahue J， Darrell T and Malik J. 2016. Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence， 38（1）： 142-158 ［DOI： 10.1109/TPAMI.2015.2437384http://dx.doi.org/10.1109/TPAMI.2015.2437384］

Haase D and Amthor M. 2020. Rethinking depthwise separable convolutions： how intra-kernel correlations lead to improved MobileNets//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 14588-14597 ［DOI： 10.1109/CVPR42600.2020.01461http://dx.doi.org/10.1109/CVPR42600.2020.01461］

Hui Z， Gao X B， Yang Y C and Wang X M. 2019. Lightweight image super-resolution with information multi-distillation network//Proceedings of the 27th ACM International Conference on Multimedia. Nice， France： ACM： 2024-2032 ［DOI： 10.1145/3343031.3351084http://dx.doi.org/10.1145/3343031.3351084］

Jian M W and Lam K M. 2015. Simultaneous hallucination and recognition of low-resolution faces based on singular value decomposition. IEEE Transactions on Circuits and Systems for Video Technology， 25（11）： 1761-1772 ［DOI： 10.1109/TCSVT.2015.2400772http://dx.doi.org/10.1109/TCSVT.2015.2400772］

Kim J， Lee J K and Lee K M. 2016a. Accurate image super-resolution using very deep convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 1646-1654 ［DOI： 10.1109/CVPR.2016.182http://dx.doi.org/10.1109/CVPR.2016.182］

Kim J， Lee J K and Lee K M. 2016b. Deeply-recursive convolutional network for image super-resolution//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 1637-1645 ［DOI： 10.1109/CVPR.2016.181http://dx.doi.org/10.1109/CVPR.2016.181］

Kingma D P and Ba J L. 2017. Adam： a method for stochastic optimization ［EB/OL］. ［2023-04-13］. https://arxiv.org/pdf/1412.6980.pdfhttps://arxiv.org/pdf/1412.6980.pdf

Lei P C， Liu C， Tang J G and Peng D L. 2020. Hierarchical feature fusion attention network for image super-resolution reconstruction. Journal of Image and Graphics， 25（9）： 1773-1786

雷鹏程，刘丛，唐坚刚，彭敦陆. 2020. 分层特征融合注意力网络图像超分辨率重建. 中国图象图形学报， 25（9）： 1773-1786 ［DOI： 10.11834/jig.190607http://dx.doi.org/10.11834/jig.190607］

Li B C， Li X， Lu Y T， Liu S， Feng R Y and Chen Z B. 2023. HST： hierarchical swin Transformer for compressed image super-resolution//Proceedings of the European Conference on Computer Vision. Tel Aviv， Israel： Springer： 651-668 ［DOI： 10.1007/978-3-031-25063-7_41http://dx.doi.org/10.1007/978-3-031-25063-7_41］

Li W B， Zhou K， Qi L， Jiang N J， Lu J B and Jia J Y. 2020. LAPAR： linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： 20343-20355

Liang J Y， Cao J Z， Sun G L， Zhang K， van Gool L and Timofte R. 2021. SwinIR： image restoration using swin Transformer//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision Workshops. Montreal， Canada： IEEE： 1833-1844 ［DOI： 10.1109/ICCVW54120.2021.00210http://dx.doi.org/10.1109/ICCVW54120.2021.00210］

Lim B， Son S， Kim H， Nah S and Lee K M. 2017. Enhanced deep residual networks for single image super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu， USA： IEEE： 1132-1140 ［DOI： 10.1109/CVPRW.2017.151http://dx.doi.org/10.1109/CVPRW.2017.151］

Liu Z， Lin Y T， Cao Y， Hu H， Wei Y X， Zhang Z， Lin S and Guo B N. 2021. Swin Transformer： hierarchical vision Transformer using shifted windows//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 9992-10002 ［DOI： 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986］

Lu Z S， Li J C， Liu H， Huang C Y， Zhang L L and Zeng T Y. 2022. Transformer for single image super-resolution//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 456-465 ［DOI： 10.1109/CVPRW56347.2022.00061http://dx.doi.org/10.1109/CVPRW56347.2022.00061］

Luo X T， Xie Y， Zhang Y L， Qu Y Y， Li C H and Fu Y. 2020. LatticeNet： towards lightweight image super-resolution with lattice block//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 272-289 ［DOI： 10.1007/978-3-030-58542-6_17http://dx.doi.org/10.1007/978-3-030-58542-6_17］

Park K， Soh J W and Cho N I. 2023. A dynamic residual self-attention network for lightweight single image super-resolution. IEEE Transactions on Multimedia， 25： 907-918 ［DOI： 10.1109/TMM.2021.3134172http://dx.doi.org/10.1109/TMM.2021.3134172］

Qiu D F， Jiang J J， Hu X Y， Liu X M and Ma J Y. 2023. Guided Transformer for high-resolution visible image guided infrared image super-resolution. Journal of Image and Graphics， 28（1）： 196-206

邱德粉，江俊君，胡星宇，刘贤明，马佳义. 2023. 高分辨率可见光图像引导红外图像超分辨率的Transformer网络. 中国图象图形学报， 28（1）： 196-206 ［DOI： 10.11834/jig.220604http://dx.doi.org/10.11834/jig.220604］

Tai Y， Yang J and Liu X M. 2017. Image super-resolution via deep recursive residual network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 2790-2798 ［DOI： 10.1109/CVPR.2017.298http://dx.doi.org/10.1109/CVPR.2017.298］

Vaswani A， Shazeer N， Parmar N， Uszkoreit J， Jones L， Gomez A N， Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 6000-6010

Wang R， Jian M W， Yu H， Wang L and Yang B. 2022. Face hallucination using multisource references and cross-scale dual residual fusion mechanism. International Journal of Intelligent Systems， 37（11）： 9982-10000 ［DOI： 10.1002/int.23024http://dx.doi.org/10.1002/int.23024］

Wang S Y and Shen L S. 2007. Intelligent visual surveillance technology： a survey. Journal of Image and Graphics， 12（9）： 1505-1514

王素玉，沈兰荪. 2007. 智能视觉监控技术研究进展. 中国图象图形学报， 12（9）： 1505-1514 ［DOI： 10.3969/j.issn.1006-8961.2007.09.001http://dx.doi.org/10.3969/j.issn.1006-8961.2007.09.001］

Wang Z H， Chen J and Hoi S C H. 2021. Deep learning for image super-resolution： a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence， 43（10）： 3365-3387 ［DOI： 10.1109/TPAMI.2020.2982166http://dx.doi.org/10.1109/TPAMI.2020.2982166］

Yang F Z， Yang H， Fu J L， Lu H T and Guo B N. 2020. Learning texture Transformer network for image super-resolution//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 5790-5799 ［DOI： 10.1109/CVPR42600.2020.00583http://dx.doi.org/10.1109/CVPR42600.2020.00583］

Yang H L， Zhang Y J， Cui Z W， Xu Y J and Yang Y T. 2023. DGRN： image super-resolution with dual gradient regression guidance. Computers and Graphics， 110： 141-150 ［DOI： 10.1016/j.cag.2022.12.005http://dx.doi.org/10.1016/j.cag.2022.12.005］

Zhao H Y， Kong X T， He J W， Qiao Y and Dong C. 2020. Efficient image super-resolution using pixel attention//Proceedings of the Computer Vision——ECCV 2020 Workshops. Glasgow， UK： Springer： 56-72 ［DOI： 10.1007/978-3-030-67070-2_3http://dx.doi.org/10.1007/978-3-030-67070-2_3］

Zou W B， Ye T， Zheng W X， Zhang Y C， Chen L and Wu Y. 2022. Self-calibrated efficient Transformer for lightweight super-resolution//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New Orleans， USA： IEEE： 929-938 ［DOI： 10.1109/CVPRW56347.2022.00107http://dx.doi.org/10.1109/CVPRW56347.2022.00107］

文章被引用时，请邮件提醒。

提交

航空遥感图像深度学习目标检测技术研究进展

高分辨率可见光图像引导红外图像超分辨率的Transformer网络

显著性引导的目标互补隐藏弱监督语义分割

图像去模糊研究综述

融合残差上下文编码和路径增强的视杯视盘分割