Information interactive linear Transformer network for infrared and visible image fusion
- Pages: 1-14(2024)
Published Online: 23 December 2024
DOI: 10.11834/jig.240569
移动端阅览
浏览全部资源
扫码关注微信
Published Online: 23 December 2024 ,
移动端阅览
杨天宇,霍宏涛,刘晓文等.用于红外与可见光图像融合的信息交互线性Transformer网络[J].中国图象图形学报,
Yang Tianyu,Huo Hongtao,Liu Xiaowen,et al.Information interactive linear Transformer network for infrared and visible image fusion[J].Journal of Image and Graphics,
目的
2
基于Transformer的深度学习技术在红外与可见光图像融合任务中表现出优异性能。然而基于Transformer的图像融合网络架构计算效率较低,训练过程中需要消耗大量计算资源和储存空间。此外,现有大多数融合方法忽略了编码阶段的跨模态信息交互。
方法
2
针对这两个问题,本文提出了一种新型的红外与可见光图像融合算法,采用基于信息交互线性Transformer的网络结构,包括双分支编码器和解码器。双分支编码器首先利用卷积层提取源图像浅层特征,随后通过全局信息快速交互模块中信息交互快速 Transformer实现跨模态信息的深度整合,解码器利用多层级联卷积模块得到高质量重建图像。此外,设计傅里叶变换损失引导模型训练,以保留源图像重要频域信息。
结果
2
对比实验在MSRS和TNO数据集上与13种传统以及深度学习融合方法进行比较。主观评价方面,本文方法在复杂场景下融合效果优势明显,能够产生人物目标轮廓清晰,高视觉质量的融合结果。客观指标方面,在MSRS数据集上本文算法在信息熵、视觉保真度、平均梯度、边缘强度和基于梯度的相似性度量5项指标上取得最优值,相比于对比方法最优值分别提升了0.75%,0.15%,1.56%,1.52%,1.27%。在TNO数据集上本文所提算法在的信息熵、视觉保真度、平均梯度和基于梯度相似性度量4项指标上取得最优值,相比于对比方法最优值分别提升了1.53%,3.79%,0.17%,6.94%。消融实验验证了本文融合网络各组件的有效性,并对本文方法计算复杂度进行分析。
结论
2
本文提出了基于信息交互线性Transformer的融合网络,在提升视觉效果、保留复杂场景下纹理细节信息以及提高计算效率等方面均具有优越性。
Objective
2
Transformer-based deep learning techniques have shown excellent performance in infrared and visible image fusion. However, Transformer-based image fusion networks suffer from low computational efficiency and require substantial computational resources and storage space during training. Additionally, most existing fusion methods overlook cross-modal information interaction during the encoding phase, leading to the loss of complementary information, especially in complex scenarios such as low-light conditions or smoke occlusion, where it is challenging to highlight target features. To address these challenges, we propose a fusion network based on an information interaction linear Transformer (I2F Transformer). The proposed I2F Transformer reduces computational costs while enhancing cross-modal information integration, producing fused images with clear backgrounds and prominent targets.
Method
2
In this study, we propose a novel infrared and visible image fusion algorithm that employs a network structure based on an information-interacting linear Transformer, including a dual-branch encoder and decoder. First, the encoder adopts a dual-branch structure to extract features from infrared and visible light images separately. The encoder extracts shallow features from the source images using a 3×3 convolutional layer with a stride of 1. Next, these shallow features are input into three cascaded global information fast modules (GIFM), where complementary features from infrared and visible feature maps are further extracted and integrated through the I2F Transformer to achieve deep integration of cross-modal features. The decoder consists of five cascaded convolutional modules, each containing a convolutional layer and an activation function layer, ultimately reconstructing the fused features into a high-quality fused image. To enhance the model’s training effectiveness, a Fourier transform loss function is designed to preserve critical frequency domain information from the source images, ensuring the detail and clarity of the fused images.
Result
2
Comparative experiments were conducted on the MSRS and TNO datasets, comparing 13 traditional and deep learning-based fusion methods. In subjective evaluation, the proposed method demonstrated a clear advantage in complex scenarios. Regarding objective evaluation, our method on the MSRS dataset achieved the optimal values in five objective metrics: entropy (EN), visual information fidelity (VIF), average gradient (AG), edge intensity (EI) and gradient-based similarity measure (Q
AB/F
). Compared with the best values in the five metrics of 13 existing fusion algorithms, an average increase of are outreached 0.75%, 0.15%, 1.56%, 1.52% and 1.27%. Our method on the TNO dataset achieved the best values in EN, VIF, AG and Q
AB/F
. Compared with the best values in the above four metrics of 13 existing fusion algorithms, an average increase of are outreached 1.53%, 3.79%, 0.17% and 6.94%. Additionally, we validated the effectiveness of each component in the network through ablation experiments. To further validate the computational efficiency of the fusion model, we analyzed the algorithm’s computational complexity.
Conclusion
2
In this study, we proposed a fusion network, based on an information-interaction linear Transformer, innovatively introduces a cross-modal information interaction mechanism and achieves network lightweighting through linear modules in the I2F Transformer. Experimental results show that, compared to 13 existing fusion methods, the proposed method demonstrates superiority in enhancing visual effects, preserving texture details in complex scenes, and improving computational efficiency.
图像融合信息交互线性Transformer傅里叶变换损失深度学习
image fusioninformation interactionlinear transformerFourier transform lossdeep learning
Amarsaikhan D, Saandar M, Ganzorig M, Blotevogel H, Egshiglen E, Gantuyal R, Nergui B and Enkhjargal D. 2012. Comparison of multisource image fusion methods and land cover classification. International Journal of Remote Sensing, 33(8): 2532-2550 [DOI: 10.1080/01431161.2011.616552http://dx.doi.org/10.1080/01431161.2011.616552]
Chen M S. 2016. Image fusion of visual and infaredimage based on NSCT and compressed sensing. Joumal of Imagc and Graphics, 21(1): 0039-0044.
陈木生. 2016. 结合NSCT和压缩感知的红外与可见光图像融合. 中国图象图形学报, 21(01): 0039-0044. [DOI: 10.11834/jig.20160105http://dx.doi.org/10.11834/jig.20160105]
Cui G, Feng H, Xu Z, Li Q and Chen Y. 2015. Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Optics Communications, 341: 199-209 [DOI: 10.1016/j.optcom.2014.12.032http://dx.doi.org/10.1016/j.optcom.2014.12.032]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G and Gelly S. 2020. An image is worth16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 [DOI: 10.48550/arXiv.2010.11929http://dx.doi.org/10.48550/arXiv.2010.11929]
Gao X Q, Liu G, Xiao G, BAVIRISETTIDurga Prasad, Shi K L. 2020. Fusion algorithm of infrared and visible images based on FPDE. Acta Automatica Sinica, 46(4): 9
高雪琴, 刘刚, 肖刚, BAVIRISETTIDurga Prasad, 史凯磊. 2020. 基于FPDE的红外与可见光图像融合算法. 自动化学报, 46(4): 9 [DOI: 10.16383/j.aas.2018.c180188http://dx.doi.org/10.16383/j.aas.2018.c180188]
Han D, Pan X, Han Y, Song S and Huang G. 2023. Flatten transformer: Vision transformer using focused linear attention//Proceedings of the IEEE/CVF International Conference on Computer Vision: 5961-5971 [DOI: 10.1109/ICCV51070.2023.00548http://dx.doi.org/10.1109/ICCV51070.2023.00548]
Han Y, Cai Y, Cao Y and Xu X. 2013. A new image fusion performance metric based on visual information fidelity. Information Fusion, 14(2): 127-135 [DOI: 10.1016/j.inffus.2011.08.002http://dx.doi.org/10.1016/j.inffus.2011.08.002]
Harsanyi J C and Chang C-I. 1994. Hyperspectral image classification and dimensionality reduction: An orthogonal subspace projection approach. IEEE Transactions on geoscience and remote sensing, 32(4): 779-785 [DOI: 10.1109/36.298007http://dx.doi.org/10.1109/36.298007]
Jian L, Yang X, Liu Z, Jeon G, Gao M and Chisholm D. 2020. SEDRFuse: A symmetric encoder–decoder with residual block network for infrared and visible image fusion. IEEE Transactions on Instrumentation and Measurement, 70: 1-15 [DOI: 10.1109/TIM.2020.3022438http://dx.doi.org/10.1109/TIM.2020.3022438]
Jiang L, Dai B, Wu W and Loy C C. 2021. Focal frequency loss for image reconstruction and synthesis//Proceedings of the IEEE/CVF international conference on computer vision: 13919-13929 [DOI: 10.1109/ICCV48922.2021.01366http://dx.doi.org/10.1109/ICCV48922.2021.01366]
Li G L, Xiang W H, Zhang S L, Zhang B X. 2022. Infrared and Visible Image Fusion Algorithm Based on Residual Network and Attention Mechanism. Unmanned Systems Technology, 5(02): 9-21
李国梁, 向文豪, 张顺利, 张博勋. 2022. 基于残差网络和注意力机制的红外与可见光图像融合算法. 无人系统技术, 5(02): 9-21 [DOI: 10.19942/j.issn.2096-5915.2022.2.012http://dx.doi.org/10.19942/j.issn.2096-5915.2022.2.012]
Li H and Wu X-J. 2018. DenseFuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 28(5): 2614-2623 [DOI: 10.1109/TIP.2018.2887342http://dx.doi.org/10.1109/TIP.2018.2887342]
Li H and Wu X-J. 2024. CrossFuse: A novel cross attention mechanism based infrared and visible image fusion approach. Information Fusion, 103: 102147 [DOI: 10.1016/j.inffus.2023.102147http://dx.doi.org/10.1016/j.inffus.2023.102147]
Li H, Wu X-J and Kittler J. 2021. RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Information Fusion, 73: 72-86 [DOI: 10.1016/j.inffus.2021.02.023http://dx.doi.org/10.1016/j.inffus.2021.02.023]
Liu J, Fan X, Huang Z, Wu G, Liu R, Zhong W and Luo Z. 2022. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 5802-5811 [DOI: 10.1109/CVPR52688.2022.00571http://dx.doi.org/10.1109/CVPR52688.2022.00571]
Liu J, Liu Z, Wu G, Ma L, Liu R, Zhong W, Luo Z and Fan X. 2023. Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation//Proceedings of the IEEE/CVF international conference on computer vision: 8115-8124 [DOI: 10.1109/ICCV51070.2023.00745http://dx.doi.org/10.1109/ICCV51070.2023.00745]
Lu X, Zhang B, Zhao Y, Liu H and Pei H. 2014. The infrared and visible image fusion algorithm based on target separation and sparse representation. Infrared Physics & Technology, 67: 397-407 [DOI: 10.1016/j.infrared.2014.09.007http://dx.doi.org/10.1016/j.infrared.2014.09.007]
Ma J, Tang L, Fan F, Huang J, Mei X and Ma Y. 2022. SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA Journal of Automatica Sinica, 9(7): 1200-1217 [DOI: 10.1109/JAS.2022.105686http://dx.doi.org/10.1109/JAS.2022.105686]
Ma J, Xu H, Jiang J, Mei X and Zhang X-P. 2020a. DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Transactions on Image Processing, 29: 4980-4995 [DOI: 10.1109/TIP.2020.2977573http://dx.doi.org/10.1109/TIP.2020.2977573]
Ma J, Yu W, Liang P, Li C and Jiang J. 2019. FusionGAN: A generative adversarial network for infrared and visible image fusion Information Fusion, 48: 11-26. [DOI: 10.1016/j.inffus.2018.09.004http://dx.doi.org/10.1016/j.inffus.2018.09.004]
Ma J, Zhang H, Shao Z, Liang P and Xu H. 2020b. GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion. IEEE Transactions on Instrumentation and Measurement, 70: 1-14 [DOI: 10.1109/TIM.2020.3038013http://dx.doi.org/10.1109/TIM.2020.3038013]
Naidu V. 2011. Image fusion technique using multi-resolution singular value decomposition. Defence Science Journal, 61(5): 479-484 [DOI: 10.14429/dsj.61.705http://dx.doi.org/10.14429/dsj.61.705]
Oppenheim A V and Lim J S. 1981. The importance of phase in signals. Proceedings of the IEEE, 69(5): 529-541 [DOI: 10.1109/PROC.1981.12022http://dx.doi.org/10.1109/PROC.1981.12022]
Paramanandham N and Rajendiran K. 2018. Infrared and visible image fusion using discrete cosine transform and swarm intelligence for surveillance applications. Infrared Physics & Technology, 88: 13-22 [DOI: 10.1016/j.infrared.2017.11.006http://dx.doi.org/10.1016/j.infrared.2017.11.006]
Qu L, Liu S, Wang M and Song Z. 2022. TransMEF: A transformer-based multi-exposure image fusion framework using self-supervised multi-task learning//Proceedings of the AAAI conference on artificial intelligence: 2126-2134 [DOI: 10.1609/aaai.v36i2.20109http://dx.doi.org/10.1609/aaai.v36i2.20109]
Rao Y-J. 1997. In-fibre Bragg grating sensors. Measurement science and technology, 8(4): 355 [DOI: 10.1088/0957-0233/8/4/002http://dx.doi.org/10.1088/0957-0233/8/4/002]
Roberts J W, Van Aardt J A and Ahmed F B. 2008. Assessment of image fusion procedures using entropy, image quality, and multispectral classification. Journal of Applied Remote Sensing, 2(1): 023522 [DOI: 10.1117/1.2945910http://dx.doi.org/10.1117/1.2945910]
Tang L, Yuan J and Ma J. 2022a. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Information Fusion, 82: 28-42 [DOI: 10.1016/j.inffus.2021.12.004http://dx.doi.org/10.1016/j.inffus.2021.12.004]
Tang L F, Zhang H, Xu H and Ma J Y. 2023. Deep learning-based image fusion: a survey. Journal of Image and Graphics, 28(01): 0003-0036
唐霖峰, 张浩, 徐涵,马佳义. 2023. 基于深度学习的图像融合方法综述. 中国图象图形学报, 28(01): 0003-0036. [DOI: 10.11834/jig.220422http://dx.doi.org/10.11834/jig.220422]
Tang W, He F and Liu Y. 2022b. YDTR: Infrared and visible image fusion via Y-shape dynamic transformer. IEEE Transactions on Multimedia [DOI: 10.1109/TMM.2022.3192661http://dx.doi.org/10.1109/TMM.2022.3192661]
Toet A. 2017. The TNO multiband image data collection. Data in brief, 15: 249-251 [DOI: 10.1016/j.dib.2017.09.038http://dx.doi.org/10.1016/j.dib.2017.09.038]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need. Advances in neural information processing systems, 30 [DOI: 10.48550/arXiv.1706.03762http://dx.doi.org/10.48550/arXiv.1706.03762]
Wang Z, Chen Y, Shao W, Li H and Zhang L. 2022. SwinFuse: A residual swin transformer fusion network for infrared and visible images. IEEE Transactions on Instrumentation and Measurement, 71: 1-12 [DOI: 10.1109/TIM.2022.3191664http://dx.doi.org/10.1109/TIM.2022.3191664]
Wu M, Ma Y, Huang J, Fan F and Dai X. 2020. A new patch-based two-scale decomposition for infrared and visible image fusion. Infrared Physics & Technology, 110: 103362 [DOI: 10.1016/j.infrared.2020.103362http://dx.doi.org/10.1016/j.infrared.2020.103362]
Xu H, Ma J, Jiang J, Guo X and Ling H. 2020. U2Fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1): 502-518 [DOI: 10.1109/TPAMI.2020.3012548http://dx.doi.org/10.1109/TPAMI.2020.3012548]
Xydeas C S and Petrovic V. 2000. Objective image fusion performance measure. Electronics letters, 36(4): 308-309 [DOI: 10.5937/vojtehg0802181Bhttp://dx.doi.org/10.5937/vojtehg0802181B]
Yang P, Gao L F,Zi L L. 2021. Image fusion method of convolution sparsity and detail saliency map analysis. Journal of Image and Graphics, 26(10): 17.
杨培, 高雷阜,訾玲玲. 2021. 卷积稀疏与细节显著图解析的图像融合. 中国图象图形学报, 26(10): 17. [DOI: 10.11834/jig.200205http://dx.doi.org/10.11834/jig.200205]
Zhang B, Lu X, Pei H and Zhao Y. 2015. A fusion algorithm for infrared and visible images based on saliency analysis and non-subsampled Shearlet transform. Infrared Physics & Technology, 73: 286-297 [DOI: 10.1016/j.infrared.2015.10.004http://dx.doi.org/10.1016/j.infrared.2015.10.004]
Zhang Y, Liu Y, Sun P, Yan H, Zhao X and Zhang L. 2020. IFCNN: A general image fusion framework based on convolutional neural network. Information Fusion, 54: 99-118 [DOI: 10.1016/j.inffus.2019.07.011http://dx.doi.org/10.1016/j.inffus.2019.07.011]
Zhao Y, Zheng Q, Zhu P, Zhang X and Ma W.2024. TUFusion: A Transformer-Based Universal Fusion Algorithm for Multimodal Images. IEEE Transactions on Circuits and Systems for Video Technology,(3): 34. [DOI: 10.1109/TCSVT.2023.3296745http://dx.doi.org/10.1109/TCSVT.2023.3296745]
Zhao Z, Bai H, Zhang J, Zhang Y, Xu S, Lin Z, Timofte R and Van Gool L. 2023. CDDFuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 5906-5916 [DOI: 10.1109/CVPR52729.2023.00572http://dx.doi.org/10.1109/CVPR52729.2023.00572]
相关文章
相关作者
相关机构