双阶段信息蒸馏的轻量级图像超分辨率网络
Lightweight image super-resolution network via two-stage information distillation
- 2021年26卷第5期 页码:991-1005
纸质出版日期: 2021-05-16 ,
录用日期: 2020-09-02
DOI: 10.11834/jig.200265
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2021-05-16 ,
录用日期: 2020-09-02
移动端阅览
李明鸿, 常侃, 李恒鑫, 谭宇飞, 覃团发. 双阶段信息蒸馏的轻量级图像超分辨率网络[J]. 中国图象图形学报, 2021,26(5):991-1005.
Minghong Li, Kan Chang, Hengxin Li, Yufei Tan, Tuanfa Qin. Lightweight image super-resolution network via two-stage information distillation[J]. Journal of Image and Graphics, 2021,26(5):991-1005.
目的
2
在图像超分辨率(super resolution,SR)任务中采用大尺寸的卷积神经网络(convolutional neural network,CNN)可以获得理想的性能,但是会引入大量参数,导致繁重的计算负担,并不适合很多计算资源受限的应用场景。为了解决上述问题,本文提出一种基于双阶段信息蒸馏的轻量级网络模型。
方法
2
提出一个双阶段带特征补偿的信息蒸馏模块(two-stage feature-compensated information distillation block,TFIDB)。TFIDB采用双阶段、特征补偿的信息蒸馏机制,有选择地提炼关键特征,同时将不同级别的特征进行合并,不仅提高了特征提炼的效率,还能促进网络内信息的流动。同时,TFIDB引入通道关注(channel attention,CA)机制,将经过双阶段信息蒸馏机制提炼的特征进行重要性判别,增强对特征的表达能力。以TFIDB为基础构建模块,提出完整的轻量级网络模型。在提出的网络模型中,设计了信息融合单元(information fusion unit,IFU)。IFU将网络各层级的信息进行有效融合,为最后重建阶段提供准确、丰富的层级信息。
结果
2
在5个基准测试集上,在放大倍数为2时,相较于知名的轻量级网络CARN (cascading residual network),本文算法分别获得了0.29 dB、0.08 dB、0.08 dB、0.27 dB和0.42 dB的峰值信噪比(peak singal to noise ratio,PSNR)增益,且模型参数量和乘加运算量明显更少。
结论
2
提出的双阶段带补偿的信息蒸馏机制可以有效提升网络模型的效率。将多个TFIDB进行级联,并辅以IFU模块构成的轻量级网络可以在模型尺寸和性能之间达到更好的平衡。
Objective
2
Given a low-resolution image
the task of single image super-resolution (SR) is to reconstruct the corresponding high-resolution image. Due to the ill-posed characteristic of this problem
it is challenging to recover the lost details and well preserve the structures in images. To deal with this problem
different kinds of methods have been proposed in the past two decades
including interpolation-based methods
learning-based methods
and reconstruction-based methods. Recently
convolutional neural network (CNN)-based SR methods have achieved great success and received much attention. Several CNNs have been proposed for the SR task
including residual dense network (RDN)
enhanced deep residual network for super-resolution (EDSR)
and residual channel attention network. Although superior performance has been achieved
many methods utilize very large-scale networks
which definitely lead to a large number of parameters and heavy computational complexity. For example
RDN costs 22.3 million (M) parameters
and the number of parameters of EDSR even reaches 43 M. As a result
those methods might not be suitable for applications with limited memory and computing resources. To solve the above problem
this study proposes a lightweight CNN model using the two-stage information distillation strategy.
Method
2
The proposed lightweight CNN model is called two-stage feature-compensated information distillation network (TFIDN). There are three main characteristics in TFIDN. First of all
a highly efficient module
called two-stage feature-compensated information distillation block (TFIDB)
is proposed as the basic building block of TFIDN. In each TFIDB
the features can be accurately divided into different parts and then progressively refined by the two stages of information distillation. To this end
1×1 convolution layers are applied in TFIDB to implicitly learn the packing strategy
which is responsible for selecting the suitable components from the target features for further refinement. Compared with the existing information distillation network (IDN) where only one stage of information distillation is carried out
the proposed two-stage information distillation strategy can extract the features much more precisely. Besides information distillation
TFIDB additionally introduces a feature compensation mechanism
which guarantees the completeness of the features and also enforces the consistence among local memory. More specifically
the operation of feature compensation is performed by concatenating and fusing the cross-layer transferred features and the refined features. Unlike IDN
there is no need to manually adjust the output feature dimensions of different convolution layers in TFIDB; thus
the structure of TFIDB is more flexible. Second
to further increase the ability of feature extraction and discrimination learning
the wide activated super-resolution (WDSR) unit and the channel attention (CA) mechanism are both introduced in TFIDB. To improve the performance of the normal residual learning block
the WDSR unit expands the features before performing activation. To maintain the same number of parameters as that of a normal residual learning block
the input feature dimension of the WDSR unit is set as 32 in this study. Although the CA unit can effectively improve the discrimination learning ability of the network
applying too many CA units could significantly increase the depth of the network. Therefore
only one CA unit is attached at the end of each TFIDB
so as to maintain the efficiency of the network. Because the CA operation is carried out on the precisely refined features
the effectiveness of the network can be ensured. Finally
to build the full TFIDN
a number of TFIDBs are cascaded. To keep a balance between model complexity and performance
the number of TFIDBs is set as 3. To fully take advantage of different levels of information
an information fusion unit (IFU) is designed to fuse the outputs of different TFIDBs. In the existing cascading residual network (CARN)
dense connections are utilized among the building blocks
leading to a relatively large number of parameters. Different from CARN
to keep a small number of parameters
IFU only introduces one 1×1 convolution layer
which only results in 3 kilo (K) parameters.
Result
2
The proposed TFIDN is trained using DIV2K dataset. Five widely used datasets
including Set5
Set14
BSD100
Urban100
and Manga109
are used for testing. The ablation study shows that the proposed building block TFIDB and the IFU both contribute to the superior performance of the network. Compared with six famous lightweight models
including fast super-resolution convolutional neural networks
very deep network for super-resolution
Laplacian pyramid super-resolution network
persistent memory network
IDN
and CARN
the proposed TFIDN is able to achieve the highest peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) values. Specifically
with a scale factor of 2 on five testing datasets
the PSNR improvements of TFIDN over the second best method CARN are 0.29 dB
0.08 dB
0.08 dB
0.27 dB
and 0.42 dB
respectively
whereas the SSIM improvements are 0.001 6
0.000 9
0.001 7
0.003 0
and 0.000 9
respectively. The significant PSNR and SSIM improvements indicate that TFIDN is more effective than CARN. On the other hand
the number of parameters and the number of mult-adds required by TFIDN are 933 K and 53.5 giga (G)
respectively
both of which are smaller than those of CARN. This phenomenon suggests that TFIDN is more efficient than CARN. Although the proposed TFIDN consumes more parameters and mult-adds than IDN
TFIDN achieves significantly higher performance in terms of PSNR and SSIM.
Conclusion
2
The proposed two-stage feature-compensated information distillation mechanism is efficient and effective. By cascading a number of TFIDBs and introducing the IFU
the proposed lightweight network TFIDN can achieve a better trade-off in terms of model size
computational complexity
and performance.
超分辨率(SR)卷积神经网络(CNN)信息蒸馏宽激活通道关注(CA)
super-resolution (SR)convolutional neural network (CNN)information distillationwide activationchannel attention (CA)
Agustsson E and Timofte R. 2017. NTIRE 2017 challenge on single image super-resolution: dataset and study//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 1122-1131[DOI: 10.1109/CVPRW.2017.150http://dx.doi.org/10.1109/CVPRW.2017.150]
Ahn N, Kang B and Sohn K A. 2018. Fast, accurate, and lightweight super-resolution with cascading residual network//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 256-272[DOI: 10.1007/978-3-030-01249-6_16http://dx.doi.org/10.1007/978-3-030-01249-6_16]
Bevilacqua M, Roumy A, Guillemot C and Morel M L A. 2012. Low-complexity single-image super-resolution based on nonnegative neighbor embedding//Proceedings of the 23rd British Machine Vision Conference. Surrey, UK: BMVA Press: 1-12[DOI: 10.5244/c.26.135http://dx.doi.org/10.5244/c.26.135]
Chang K, Li M H, Ding P L K and Li B X. 2020. Accurate single image super-resolution using multi-path wide-activated residual network. Signal Processing, 172: #107567[DOI:10.1016/j.sigpro.2020.107567]
Dai T, Cai J R, Zhang Y B, Xia S T and Zhang L. 2019. Second-order attention network for single image super-resolution//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 11057-11066[DOI: 10.1109/CVPR.2019.01132http://dx.doi.org/10.1109/CVPR.2019.01132]
Dong C, Loy C C, He K M and Tang X O. 2014. Learning a deep convolutional network for Image super-resolution//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 184-199[DOI: 10.1007/978-3-319-10593-2_13http://dx.doi.org/10.1007/978-3-319-10593-2_13]
Dong C, Loy C C and Tang X O. 2016. Accelerating the super-resolution convolutional neural network//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 391-407[DOI: 10.1007/978-3-319-46475-6_25http://dx.doi.org/10.1007/978-3-319-46475-6_25]
He K M, Zhang X Y, Ren S Q and Sun J. 2015. Delving deep into rectifiers: surpassing human-level performance on imagenet classification//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1026-1034[DOI: 10.1109/ICCV.2015.123http://dx.doi.org/10.1109/ICCV.2015.123]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141[DOI: 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745]
Huang J B, Singh A and Ahuja N. 2015. Single image super-resolution from transformed self-exemplars//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 5197-5206[DOI: 10.1109/CVPR.2015.7299156http://dx.doi.org/10.1109/CVPR.2015.7299156]
Hui Z, Wang X M and Gao X B. 2018. Fast and accurate single image super-resolution via information distillation network//Proceedings of 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 723-731[DOI: 10.1109/CVPR.2018.00082http://dx.doi.org/10.1109/CVPR.2018.00082]
Kim J, Lee J K and Lee K M. 2016. Accurate image super-resolution using very deep convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1646-1654[DOI: 10.1109/CVPR.2016.182http://dx.doi.org/10.1109/CVPR.2016.182]
Lai W S, Huang J B, Ahuja N and Yang M H. 2017. Deep Laplacian pyramid networks for fast and accurate super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5835-5843[DOI: 10.1109/CVPR.2017.618http://dx.doi.org/10.1109/CVPR.2017.618]
Lim B, Son S, Kim H, Nah S and Lee K M. 2017. Enhanced deep residual networks for single image super-resolution//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA: IEEE: 1132-1140[DOI: 10.1109/CVPRW.2017.151http://dx.doi.org/10.1109/CVPRW.2017.151]
Matsui Y, Ito K, Aramaki Y, Fujimoto A, Ogawa T, Yamasaki T and Aizawa K. 2017. Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications, 76(20): 21811-21838[DOI:10.1007/s11042-016-4020-z]
Mei Y Q, Fan Y C, Zhou Y Q, Huang L C, Huang T S and Shi H H. 2020. Image super-resolution with cross-scale non-local attention and exhaustive self-exemplars mining//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 5690-5699[DOI: 10.1109/cvpr42600.2020.00573http://dx.doi.org/10.1109/cvpr42600.2020.00573]
Salimans T and Kingma D P. 2016. Weight normalization: a simple reparameterization to accelerate training of deep neural networks//Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc. : 901-909
Shi W Z, Caballero J, Huszár F, Totz J, Aitken A P, Bishop R, Rueckert D and Wang Z H. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1874-1883[DOI: 10.1109/CVPR.2016.207http://dx.doi.org/10.1109/CVPR.2016.207]
Soh J W, Cho S and ChoN I. 2020. Meta-transfer learning for zero-shot super-resolution//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 3516-3525[DOI: 10.1109/cvpr42600.2020.00357http://dx.doi.org/10.1109/cvpr42600.2020.00357]
Tai Y, Yang J, Liu X M and Xu C Y. 2017. MemNet: a persistent memory network for image restoration//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4549-4557[DOI: 10.1109/ICCV.2017.486http://dx.doi.org/10.1109/ICCV.2017.486]
Timofte R, De Smet V and Van Gool L. V. 2014. A+: adjusted anchored neighborhood regression for fast super-resolution//Proceedings of the 12th Asian Conference on Computer Vision, Singapore, Singapore: Springer: 111-126[DOI: 10.1007/978-3-319-16817-3_8http://dx.doi.org/10.1007/978-3-319-16817-3_8]
Timofte R, Gu S H, Wu J Q, Van Gool L, Zhang L, Yang M H, Haris M, Shakhnarovich G, Ukita N, Hu S J, Bei Y J, Hui Z, Jiang X, Gu Y N, Liu J, Wang Y F, Perazzi F, McWilliams B, Sorkine-Hornung A, Sorkine-Hornung O, Schroers C, Yu J H, Fan Y C, Yang J C, Xu N, Wang Z W, Wang X C, Huang T S, Wang X T, Yu K, Hui T W, Dong C, Lin L, Loy C C, Park D, Kim K, Chun S Y, Zhang K, Liu P J, Zuo W M, Guo S, Liu J Y, Xu J C, Liu Y J, Xiong F Y, Dong Y, Bai H L, Damian A, Ravi N, Menon S, Rudin C, Seo J, Jeon T, Koo J, Jeon S, Kim S Y, Choi J S, Ki S, Seo S, Sim H, Kim S, Kim M, Chen R, Zeng K, Guo J K, Qu Y Y, Li C H, Ahn N, Kang B, Sohn K A, Yuan Y, Zhang J W, Pang J H, Xu X Y, Zhao Y, Deng W, Hussain S U, Aadil M, Rahim R, Cai X W, Huang F, Xu Y S, Michelini P N, Zhu D, Liu H W, Kim J S, Lee J S, Huang Y W, Qiu M, Jing L T, Zeng J H, Wang Y, Sharma M, Mukhopadhyay R, Upadhyay A, Koundinya S, Shukla A, Chaudhury S, Zhang Z, Xu Y H and Fu L Z. 2018. NTIRE 2018 challenge on single image super-resolution: methods and results//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, USA: IEEE: 965-976[DOI: 10.1109/CVPRW.2018.00130http://dx.doi.org/10.1109/CVPRW.2018.00130]
Tong J C, Fei J L, Chen J S, Li H and Ding D D. 2019. Multi-level feature fusion image super-resolution algorithm with recursive neural network. Journal of Image and Graphics, 24(2): 302-312
佟骏超, 费加罗, 陈靖森, 李恒, 丁丹丹. 2019. 递归式多阶特征融合图像超分辨率算法. 中国图象图形学报, 24(2): 302-312
Wang Z H, Chen J and Hoi S C H. 2020. Deep learning for image super-resolution: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence: 1-22[DOI:10.1109/TPAMI.2020.2982166]
Xu Y S, Tseng S Y R, Tseng Y, Kuo H K and Tsai Y M. 2020. Unified dynamic convolutional network for super-resolution with variational degradations//Proceedings of 2020 IEEE/CVFConference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 12496-12505[DOI: 10.1109/cvpr42600.2020.01251http://dx.doi.org/10.1109/cvpr42600.2020.01251]
Yang W M, Wang W, Zhang X C, Sun S F and Liao Q M. 2019. Lightweight feature fusion network for single image super-resolution. IEEE Signal Processing Letters, 26(4): 538-542[DOI:10.1109/LSP.2018.2890770]
Ying Z L and Long X. 2019. Single-image super-resolution construction based on multi-scale dense residual network. Journal of Image and Graphics, 24(3): 410-419
应自炉, 龙祥. 2019. 多尺度密集残差网络的单幅图像超分辨率重建. 中国图象图形学报, 24(3): 410-419
Yu J H, Fan Y C, Yang J C, Xu N, Wang Z W, Wang X C and Huang T. 2018. Wide activation for efficient and accurate image super-resolution. [EB/OL]. [2020-06-05].https://arxiv.org/pdf/1808.08718v1.pdfhttps://arxiv.org/pdf/1808.08718v1.pdf
Zeyde R, Elad M and Protter M. 2010. On single image scale-up using sparse-representations//Proceedings of the 7th International Conference on Curves and Surfaces. Avignon, France: Springer: 711-730[DOI: 10.1007/978-3-642-27413-8_47http://dx.doi.org/10.1007/978-3-642-27413-8_47]
Zhang Y L, Li K P, Li K, Wang L C,Zhong B N and Fu Y. 2018a. Image super-resolution using very deep residual channel attention networks//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 294-310[DOI: 10.1007/978-3-030-01234-2_18http://dx.doi.org/10.1007/978-3-030-01234-2_18]
Zhang Y L, Li K P, Li K, Zhong B N and Fu Y. 2019. Residual non-local attention networks for image restoration//Proceedings of 2019 International Conference on Learning Representations. New Orleans, USA: [s.n.]
Zhang Y L, Tian Y P, Kong Y, Zhong B N and Fu Y. 2018b. Residual dense network for image super-resolution//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2472-2481[DOI: 10.1109/CVPR.2018.00262http://dx.doi.org/10.1109/CVPR.2018.00262]
相关作者
相关机构