目的 近年来，深度卷积神经网络成为单帧图像超分辨率重建任务中的研究热点。针对多数网络结构均是采用链式堆叠方式使得网络层间联系弱以及分层特征不能充分利用等问题，提出了多阶段融合网络的图像超分辨重建方法，进一步提高重建质量。方法 本文首先利用特征提取网络得到图像的低频特征，然后将其作为两个子网络的输入。其一通过编码网络得到低分辨率图像的结构特征信息；其二是通过阶段特征融合单元组成的多路径前馈网络得到高频特征，其中融合单元将网络连续几层的特征进行融合处理并以自适应的方式获得有效特征，然后利用多路径连接的方式连接不同的特征融合单元以增强融合单元之间的联系，提取更多的有效特征，同时也提高分层特征的利用率。最后将两个子网络得到的特征进行融合后，利用残差学习完成高分辨图像的重建。结果 在Set5、Set14、B100和Urban100四个基准测试集上进行了实验，其中放大规模为4时，峰值信噪比分别为31.69dB、28.24dB、27.39dB和25.46dB，相比其他方法的结果具有一定提升。结论 本文提出的网络克服了链式结构的弊端，其通过充分利用分层特征以提取更多的高频信息，同时利用低分辨率图像本身携带的结构特征信息共同完成重建，并取得了较好的重建效果。
Image super-resolution reconstruction via deep network based on multi-staged fusion
Shen Mingyu,Yu Pengfei,Wang Ronggui,Yang Juan,Xue Lixia(School of Computer and Information,Hefei University of Technology)
Objective Image super-resolution is an important branch of digital image processing and computer vision, which has been widely used in video surveillance, medical imaging and security and surveillance imaging in these years. The purpose of super-resolution is to the reconstruction of a high-resolution (HR) image from an observed degraded low-resolution (LR) image. Early methods include interpolation, neighborhood embedding, sparse coding and so on. Recently, deep convolutional neural network has become a research hotspot in the field of single image super-resolution reconstruction. Because it can learn the mapping between high-resolution images and low-resolution images better than traditional learning-based methods. But many deep learning (DL)-based methods has two obvious drawbacks. First, most of them use chained stacking to build the network. Each layer of the network is only related to its previous layer, which leads to weak inter-layer relationships. Second, the hierarchical features of the network are not fully utilized. These shortcomings can lead to the loss of high frequency components. To address these drawbacks, a novel image super-resolution reconstruction method based on multi-staged fusion network is proposed, which is used to improve the quality of image reconstruction. Method Studies have shown that feature re-usage can improve the ability of the network to extract features and express features, so our research is based on the idea of feature re-usage. And we implemented this idea through the multipath connection includes two forms of global multipath mode and local fusion unit. First, the proposed model uses an interpolated low-resolution image as input. The feature extraction network extracts shallow features as the mixture network’s input. Mixture network consists of two parts. One is pixel encoding network, which is used to obtain structural feature information of the image. It has four weight layers, each consisting of 64 filters with a size of 1x1, which can guarantee that the feature-maps distribution will not be destroyed. This process is similar to the process of encoding and decoding pixels. The other is multi-path feedforward networks, which is used to extract the high frequency components needed for reconstruction. It is formed by some staged feature fusion units connected by multi-path mode. Each fusion unit is composed of a dense connection layer, a residual learning layer, and a feature selection layer. The dense connection layer is composed of four weight layers with 32 filters with a size of 3×3, which is used to improve the nonlinear mapping capability of the network and extract more high frequency information. The residual learning layer contains a 1x1 weight layer to alleviate the problem of vanishing gradients. Feature selection layer uses a 1x1 weight layer to obtain effective features. Then, the multi-path mode is used to connect different units, which could enhance the relationship between the fusion units. It will cause more effective features to be extracted and the utilization of hierarchical features to be increased. Both sub-networks output 64 feature-maps, fusing their output features as input of reconstructed network which includes a 1x1 weight layer, so we can get the final residual image between low-resolution image and high-resolution image. Finally, the reconstructed image can be obtained by combining the original low-resolution image and residual image. In the training processing, we choose the rectified linear unit as activation function to accelerate the training process and avoid the gradient vanish. And for a weight layer with a filter size of 3x3, we pad one pixel to ensure that all feature-maps are a same size, which can improve the edge information of the reconstructed image. Furthermore, the initial learning rate is set to 0.1 and then decreased to half every 10 epochs, which can accelerate the convergence of the network. We set mini-batch size of SGD and momentum parameter to 0.9. We employ 291 images as the training set. In addition, we using data augmentation (rotation 〖90〗^°, 〖180〗^°, 〖270〗^° and vertical flip) to augment the training set, which could avoid the overfitting problems and increase sample diversity. The network is trained with multiple scale factors (x2,x3 and x4), so it could be used to solve reconstruction problem of different scale factors. Result All experiments are implemented under the framework of pytorch. We use four common benchmark sets (Set5, Set14, B100 and Urban100) to evaluate our model. We use peak signal-to-noise ratio (PSNR) as evaluation criteria. All the image of RGB space are converted to YCbCr space. Because human vision is more sensitive to the luminance channel, the proposed algorithm only reconstructs the luminance channel Y, and the Cb and Cr channels are reconstructed by using the interpolation method. The experimental result on four benchmark sets for scaling factor of 4 are 31.69dB, 28.24dB, 27.39dB and 25.46dB respectively. Compared with Bicubic, A+, SRCNN, VDSR, DRCN and DRRN, the proposed method shows better performance and visual effects. In addition, we have validated the effectiveness of the proposed components which includes multipath mode, staged fusion unit, and pixel coding network. Conclusion The proposed network overcomes the shortcoming of the chain structure, it extracts more high-frequency information by making full use of the hierarchical features and simultaneously uses the structural feature information carried by the low-resolution image to complete the reconstruction together. Furthermore, some techniques that include dense connection and residual learning, which are employed to accelerate convergence and mitigate gradient problems during the training. Extensive experiments show that the proposed method can reconstruct image with more high frequency details than other methods that includes the same preprocessing step. We will consider using the idea of recursive learning and increasing the number of training samples to further optimize the model in next work.