多尺度大核注意力特征融合网络的图像超分辨率重建方法
Image Super-Resolution Reconstruction Method Based on Multi-Scale Large-Kernel Attention Feature Fusion Network
- 2024年 页码:1-15
网络出版日期: 2024-10-16
DOI: 10.11834/jig.240042
移动端阅览
浏览全部资源
扫码关注微信
网络出版日期: 2024-10-16 ,
移动端阅览
宋霄罡,张鹏飞,刘万波等.多尺度大核注意力特征融合网络的图像超分辨率重建方法[J].中国图象图形学报,
Song Xiaogang,Zhang Pengfei,Liu Wanbo,et al.Image Super-Resolution Reconstruction Method Based on Multi-Scale Large-Kernel Attention Feature Fusion Network[J].Journal of Image and Graphics,
目的
2
图像超分辨率重建是计算机视觉领域内的基础任务。卷积神经网络通过滑动窗口机制和参数共享特性能够有效的提取局部特征,但对图像远距离信息的感知能力较弱。Transformer中的自注意力机制可以更好地捕捉序列中的全局依赖关系,但同时会带来高额计算资源占用的问题。
方法
2
为了解决这些问题,本文提出了一种基于多尺度大核注意力特征融合网络的超分辨率重建方法MLFN,该网络采用多路径结构学习不同的水平特征表示,从而增强网络的多尺度提取能力。此外,设计了一种多尺度大核可分离卷积块,它兼顾了自注意力机制强大的全局信息捕捉能力和卷积强大的局部感知能力,能更好地提取全局特征与局部特征。同时,在末端加入了轻量级的标准化注意力模块,在进一步增强模型性能的同时,实现了网络模型的轻量化设计。
结果
2
基于5个公开测试数据集,与11种代表性方法进行了实验对比,结果表明本文方法在不同放大倍数下均有最佳表现,所提MLFN比信息多重蒸馏网络(IMDN)的PSNR平均提升0.2dB,重建图像在视觉上具有明显优势。
结论
2
本文提出了一种基于多尺度大核注意力特征融合网络的超分辨率重建方法,借助精心设计的多尺度大核可分离卷积块,有效提高了网络的长距离关系建模能力,利用多路提取块引入多尺度特征进一步提高重建精度,引入标准化注意力模块在实现性能提升的同时维持较低的计算资源消耗。
Objective
2
Image super-resolution reconstruction is a foundational and critical task in the field of computer vision, which aims to enhance the resolution and visual quality of low-resolution images. In recent years, with the rapid advancement of deep learning technologies, a plethora of image super-resolution methods have been developed, most of which leverage the power of deep learning models to achieve superior performance. Early methods were predominantly based on convolutional neural networks (CNNs), which gained popularity due to their efficient local feature extraction capabilities through the sliding window mechanism and parameter sharing. However, one inherent limitation of CNNs is their restricted receptive field, which limits their ability to capture long-range dependencies and contextual information within the image. As a result, convolution-based methods may struggle to fully restore fine details in distant regions of the image. With the advent of Transformer technology in computer vision, self-attention mechanisms have demonstrated remarkable capability in capturing global dependencies across the entire image. This allows for better restoration of super-resolution images with enhanced clarity and detail. Nonetheless, the increased computational cost associated with self-attention mechanisms introduces significant challenges, particularly in terms of algorithmic complexity and resource consumption. Therefore, balancing high-precision image reconstruction with the need for reduced computational resources remains a critical challenge. Achieving this balance is essential for the broader adoption and practical application of super-resolution reconstruction techniques across various real-world scenarios.
Method
2
To address these challenges, this paper proposes a super-resolution reconstruction method, MLFN, based on a multi-scale large-kernel attention feature fusion network. The approach involves four main stages. Initially, the low-resolution image undergoes specific preprocessing operations and is then input into the network. The input image undergoes processing through an unrestricted blueprint convolution and is then fed into a multi-path feature extraction module for global and local feature extraction. Within the multi-path feature extraction module, a multi-scale large-kernel separable convolution block is introduced, enhancing the network's receptive field while minimizing parameter consumption. Finally, a lightweight normalized attention mechanism is incorporated at the end to further improve reconstruction accuracy. This network adopts a multi-path structure to learn different horizontal feature representations, thereby enhancing its multi-scale extraction capability. Additionally, a multi-scale large-kernel separable convolution block is designed, balancing the powerful global information capturing ability of self-attention mechanisms and the strong local perception ability of convolution, enabling better extraction of both global and local features. Simultaneously, a lightweight normalized attention module is incorporated at the end, further enhancing model performance while achieving a lightweight design for the network model.
Results
2
MLFN utilized the DF2K dataset for training, comprising 800 and 2650 training images, and evaluated its performance on test sets including five benchmark datasets: Set5, Set14, BSD100, Urban100, and Manga109. Set5 was also used as the validation set. Additionally, the study employed bicubic interpolation to downsample high-resolution images to the desired scales (×2, ×3, ×4), simulating low-resolution images with the downsampled counterparts. Given the human visual system's higher sensitivity to luminance details than color changes, evaluations were conducted on the Y channel (luminance) in the YCbCr color space. Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) were chosen as evaluation metrics. To showcase the performance of our approach, experiments were conducted on five publicly available test datasets, comparing with eleven representative methods. The results indicate that our proposed MLFN consistently outperforms IMDN at various upscaling factors, with an average PSNR improvement of 0.2dB. The reconstructed images exhibit a significant visual advantage. Furthermore, a parameter count comparison among different methods reveals that our proposed model maintains a certain advantage over other advanced lightweight methods.
Conclusion
2
This paper introduces a novel super-resolution reconstruction method, MLFN, which is based on a multi-scale large-kernel attention feature fusion network. The core of this approach lies in the integration of a multi-scale large-kernel separable convolution block. This block enhances the overall quality of image reconstruction by effectively balancing the strengths of self-attention mechanisms, which excel at capturing long-range global dependencies, with the robust local perception ability of convolutional layers. This combination allows for more accurate and efficient extraction of both global and local features from images, addressing a common limitation in traditional convolutional neural networks that struggle to capture long-range contextual information. Moreover, the method introduces multi-path feature extraction blocks, which are designed to capture different horizontal feature representations. This multi-path architecture allows the network to gather image details at various scales, significantly improving the reconstruction accuracy across different resolutions. Another important aspect of the proposed method is the adoption of a lightweight normalized attention mechanism. This attention mechanism enhances the model's capability by selectively focusing on important features while avoiding the introduction of additional fully connected or convolutional layers, thus reducing the overall parameter count. This makes the model much lighter and more efficient. Thanks to its optimized architecture, this method achieves high performance in both image reconstruction quality and model lightweighting. It is particularly suitable for applications on mobile devices and embedded systems, where computational resources are limited. As a result, the MLFN provides a lightweight and efficient solution for image super-resolution reconstruction tasks.
图像超分辨重建大核可分离卷积注意力机制特征融合多路学习
image super-resolution (SR) reconstructionlarge kernel separation convolutionattention mechanismfeature fusionmulti-path learning
Bevilacqua M, Aline R, Christine G and Marie L A-M. 2012. Low-complexity single-image super-resolution based on nonnegative neighbor embedding //Proceedings of British machine vision conference. Guildford: BMVA Press: 135 [DOI:10.5244/c.26.135]
Chen H, Wang Y H, Guo T Y, Xu C, Deng Y P, Liu Z H, Ma S W, Xu C J, Xu C and Gao W. 2021. Pre-trained image processing transformer //Proceedings of the IEEE conference on computer vision and pattern recognition. Nashville: IEEE: 12299-12310 [DOI:10.1109/cvpr46437.2021.01212http://dx.doi.org/10.1109/cvpr46437.2021.01212]
Chen H Y, Gu J J and Zhang Z. 2021. Attention in attention network for image super-resolution. [EB/OL].[2021-11-07]. https://arxiv.org/pdf/2104.09497.pdfhttps://arxiv.org/pdf/2104.09497.pdf
Cheng G A, Matsune A, Du H, Liu X Z and Zhan S. 2022. Exploring more diverse network architectures for single image super-resolution. Knowledge Based System, 235:107648 [DOI:10.1016/j.knosys.2021.107648http://dx.doi.org/10.1016/j.knosys.2021.107648]
Choi H, Lee J and Yang J.2023 NGswin: N-Gram Swin Transformer for Efficient Single Image Super-Resolution[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Vancouver.[DOI:10.1109/cvpr52729.2023.00206http://dx.doi.org/10.1109/cvpr52729.2023.00206]
Dong C, Loy C C, He K M and Tang X O. 2014. Learning a deep convolutional network for image super-resolution //Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 184-199[DOI:10.1007/978-3-319-10593-2_13http://dx.doi.org/10.1007/978-3-319-10593-2_13]
Dong C, Loy C C, He K M and Tang X O. 2016. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2): 295–307 [DOI:10.1109/TPAMI.2015.2439281http://dx.doi.org/10.1109/TPAMI.2015.2439281]
Gao D D, Zhou D W, Wang W J, Ma Y and Li S S. 2023. Lightweight Super-Resolution via Grouping Fusion of Feature Frequencies. Journal of Computer-Aided Design & Computer Graphics,35(7): 1020-1031
高丹丹, 周登文 , 王婉君, 马钰, 李珊珊. 2023. 特征频率分组融合的轻量级图像超分辨率重建. 计算机辅助设计与图形学学报, 35(7): 1020-1031 [DOI:10.3724/SP.J.1089.2023.19524http://dx.doi.org/10.3724/SP.J.1089.2023.19524]
Haase D and Manuel A. 2020. Rethinking depthwise separable convolutions: How intra-kernel correlations lead to improved mobilenets // Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Seattle: IEEE: 14600-14609 [DOI:10.1109/cvpr42600.2020.01461http://dx.doi.org/10.1109/cvpr42600.2020.01461]
Huang J B, Abhishek S and Narendra A. 2015. Single image super-resolution from transformed self-exemplars //Proceedings of the IEEE conference on computer vision and pattern recognition. Boston: IEEE: 5197-5206 [DOI: 10.1109/cvpr.2015.7299156http://dx.doi.org/10.1109/cvpr.2015.7299156]
Hui Z, Wang X M and Gao X B. 2018. Fast and accurate single image super-resolution via information distillation network //Proceedings of the IEEE conference on computer vision and pattern recognition. Salt Lake City: IEEE: 723–731 [DOI: 10.1109/cvpr.2018.00082http://dx.doi.org/10.1109/cvpr.2018.00082]
Hui Z, Gao X B, Yang Y C Y and Wang X M. 2019. Lightweight image super-resolution with information multi-distillation network // Proc of the 27th ACM international conference on multimedia. New York: Association for Computing Machinery: 2024-2032 [DOI:10.1145/3343031.3351084http://dx.doi.org/10.1145/3343031.3351084]
Hui Z, Gao X B, Yang Y C and Wang X M. 2019. Lightweight image super-resolution with information multi-distillation network[C]//Proceedings of the 27th ACM international conference on multimedia. New York 2024-2032. [DOI: 10.1145/3343031.3351084http://dx.doi.org/10.1145/3343031.3351084]
Kim J, Lee J K and Lee K M. 2016. Accurate image super-resolution using very deep convolutional networks //Proceedings of the IEEE conference on computer vision and pattern recognition. Las Vegas: IEEE: 1646-1654 [DOI: 10.1109/cvpr.2016.182http://dx.doi.org/10.1109/cvpr.2016.182]
Kingma D P. and Jimmy B. 2015. Adam: A method for stochastic optimization //Proceeding of international conference learning representation. San Diego: Ithaca: 1-13 [DOI: 10.48550/arXiv.1412.6980http://dx.doi.org/10.48550/arXiv.1412.6980]
Lai W S, Huang J B, Ahuja N and Yang M H. 2017. Deep laplacian pyramid networks for fast and accurate super-resolution //Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu: IEEE: 624-632 [DOI:10.1109/CVPR.2017.618http://dx.doi.org/10.1109/CVPR.2017.618]
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A and Aitken A. 2017. Photo-realistic single image super-resolution using a generative adversarial network //Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu: IEEE: 4681–4690 [DOI:10.1109/cvpr.2017.19http://dx.doi.org/10.1109/cvpr.2017.19]
Lei S, Shi Z W and Mo W J. 2022. Transformer-based multistage enhancement for remote sensing image super-resolution. IEEE Transactions on Geoscience and Remote Sensing, 60:5610-5611[DOI:10.1109/TGRS.2021.3136190http://dx.doi.org/10.1109/TGRS.2021.3136190]
Liang J Y, Cao J Z, Sun G L, Zhang K, Gool L V and Timofte R. 2021. SwinIR: Image restoration using swin transformer //Proceedings of the IEEE/CVF international conference on computer vision. Montreal: IEEE: 1833-1844 [DOI: 10.1109/iccvw54120.2021.00210http://dx.doi.org/10.1109/iccvw54120.2021.00210]
Lim B, Son S, Kim H, Nah S and Lee K M. 2017. Enhanced deep residual networks for single image super-resolution //Proceedings of the IEEE conference on computer vision and pattern recognition workshops. Honolulu: IEEE: 1132-1140 [DOI:10.1109/cvprw.2017.151http://dx.doi.org/10.1109/cvprw.2017.151]
Liu J, Tang J and Wu G S. 2020. Residual feature distillation network for lightweight image super-resolution //European conference on computer vision. Glasgow: Springer: 41–55 [DOI:10.1007/978-3-030-67070-2_2http://dx.doi.org/10.1007/978-3-030-67070-2_2]
Liu J, Tang J and Wu G. 2020.Residual feature distillation network for lightweight image super-resolution[C]// Computer Vision ECCV 2020 Workshops: Glasgow, 23–28. Springer. 41-55. [DOI: 10.1007/978-3-030-67070-2_2http://dx.doi.org/10.1007/978-3-030-67070-2_2]
Liu Y C, Shao Z R, Teng Y Y and Nico H. 2021. NAM: Normalization-based Attention Module //Proceedings of the NeurIPS 2021 Workshop on ImageNet: Past, Present, and Future. Virtual. [DOI:10.48550/arXiv.2111.12419http://dx.doi.org/10.48550/arXiv.2111.12419]
Martin D, Charless F, Doron T and Jitendra M. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics //Proceedings of the IEEE international conference on computer vision. Kauai: IEEE: 416-423 [DOI: 10.1109/iccv.2001.937655http://dx.doi.org/10.1109/iccv.2001.937655]
Matsui Y, Kota I, Aramaki Y, Azuma F, Toru O, Toshihiko Y and Kiyoharu A. 2017. Sketch-based manga retrieval using MANGA109 dataset.Multimedia Tools Application, 76(20): 21811-21838 [DOI: 10.1007/s11042-016-4020-zhttp://dx.doi.org/10.1007/s11042-016-4020-z]
Mehri A, Be P and Sappa A D. 2023. TnTViT-G: Transformer in Transformer Network for Guidance Super Resolution. IEEE Access,11:11529-11540[DOI:10.1109/ACCESS.2023.3241852http://dx.doi.org/10.1109/ACCESS.2023.3241852]
Meng Z Q, Zhang J and Qiu J S. 2022. Multi-supervision loss function based smoothed super-resolution image reconstruction. Journal of Image and Graphics,27(10):2972-2983
孟志青,张晶,邱健数. 2022. 多监督损失函数光滑化图像超分辨率重建. 中国图象图形学报,27(10): 2972-2983 [DOI: 10.11834/jig. 210235http://dx.doi.org/10.11834/jig.210235]
Shi W Z, Jose C, Huszár F, Totz J, Aitken A P, Bishop R, Rueckert D and Wang Z. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network //Proceedings of the IEEE conference on computer vision and pattern recognition. Las Vegas: IEEE: 1874-1883 [DOI:10.1109/CVPR.2016.207http://dx.doi.org/10.1109/CVPR.2016.207]
Timofte R, Eirikur A, Luc V G, Yang M H and Zhang L. 2017. Ntire 2017 challenge on single image super-resolution: Methods and results //Proceedings of the IEEE conference on computer vision and pattern recognition workshops. Venice: IEEE: 114–125 [DOI:10.1109/cvprw.2017.150http://dx.doi.org/10.1109/cvprw.2017.150]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc:6000-6010 [DOI: 10.48550/ARXIV.1706.03762http://dx.doi.org/10.48550/ARXIV.1706.03762]
Wang L G, Dong X Y, Wang Y Q, Ying X Y, Lin Z P, An W and Guo Y L. 2021. Exploring sparsity in image super-resolution for efficient inference //Proc of the IEEE/CVF conference on computer vision and pattern recognition. Nashville: IEEE: 4917–4926 [DOI:10.1109/cvpr46437.2021.00488http://dx.doi.org/10.1109/cvpr46437.2021.00488]
Wang Z H, Chen J and Hoi S C.H. 2020. Deep learning for image super-resolution: A survey. IEEE transactions on pattern analysis and machine intelligence,43(10): 3365-3387 [DOI:10.1109/TPAMI.2020.2982166http://dx.doi.org/10.1109/TPAMI.2020.2982166]
Xiong W, Xiong C Y, Gao Z R, Chen W Q, Zheng R H and Tian J W. 2023. Image super-resolution with channel-attention-embedded Transformer. Journal of Image and Graphics, 28(12):3744-3757
熊巍,熊承义,高志荣,陈文旗,郑瑞华,田金文. 2023. 通道注意力嵌入的Transformer图像超分辨率重构. 中国图象图形学报,28(12):3744-3757[DOI:10.11834/jig.221033http://dx.doi.org/10.11834/jig.221033]
Xu L, Song H H and Liu Q S. 2023. Super-resolution reconstruction of binocular image based on multi-level fusion attention network. Journal of Image and Graphics, 28(04):1079-1090
徐磊,宋慧慧,刘青山. 2023. 多层次融合注意力网络的双目图像超分辨率重建. 中国图象图形学报, 28(04):1079-1090[DOI:10.11834/jig. 211119http://dx.doi.org/10.11834/jig.211119]
Zeyde R, Michael E and Matan P. 2010. On single image scale-up using sparse-representations //International conference on curves and surfaces. Avignon: Springer: 711–730 [DOI:10.1007/978-3-642-27413-8_47http://dx.doi.org/10.1007/978-3-642-27413-8_47]
Zhang Y L, Li K P, Li K, Wang L C, Zhong B and Fu Y. 2018. Image super-resolution using very deep residual channel attention networks //European conference on computer vision. Munich: Springer: 286–301 [DOI:10.1007/978-3-030-01234-2_18http://dx.doi.org/10.1007/978-3-030-01234-2_18]
Zhang Y L, Tian Y P, Kong Y, Zhong B and Fu Y. 2018. Residual dense network for image super-resolution // Proceedings of the IEEE conference on computer vision and pattern recognition. Salt Lake City: IEEE: 2472–2481 [DOI: 10.1109/cvpr.2018.00262http://dx.doi.org/10.1109/cvpr.2018.00262]
Zhao H Y, Kong X T, He J W, Qiao Y and Dong C. 2020. Efficient image super-resolution using pixel attention //Proc of the European conference on computer vision. Glasgow: Springer: 56–72 [DOI:10.1007/978-3-030-67070-2_3http://dx.doi.org/10.1007/978-3-030-67070-2_3]
相关作者
相关机构