多模态数据特征融合的广角图像生成
Multi-modality feature fusion-based wide field-of-view image generation
- 2025年30卷第1期 页码:173-187
纸质出版日期: 2025-01-16
DOI: 10.11834/jig.240056
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2025-01-16 ,
移动端阅览
姜智颖, 张曾翕, 刘晋源, 刘日升. 多模态数据特征融合的广角图像生成[J]. 中国图象图形学报, 2025,30(1):173-187.
JIANG ZHIYING, ZHANG ZENGXI, LIU JINYUAN, LIU RISHENG. Multi-modality feature fusion-based wide field-of-view image generation. [J]. Journal of image and graphics, 2025, 30(1): 173-187.
目的
2
图像拼接通过整合不同视角的可见光数据获得广角合成图。不利的天气因素会使采集到的可见光数据退化,导致拼接效果不佳。红外传感器通过热辐射成像,在不利的条件下也能突出目标,克服环境和人为因素的影响。
方法
2
考虑到红外传感器和可见光传感器的成像互补性,本文提出了一个基于多模态数据(红外和可见光数据)特征融合的图像拼接算法。首先利用红外数据准确的结构特征和可见光数据丰富的纹理细节由粗到细地进行偏移估计,并通过非参数化的直接线性变换得到变形矩阵。然后将拼接后的红外和可见光数据进行融合,丰富了场景感知信息。
结果
2
本文选择包含530对可拼接多模态图像的真实数据集以及包含200对合成数据集作为测试数据,选取了3个最新的融合方法,包括RFN(residual fusion network)、ReCoNet(recurrent correction network)和DATFuse(dual attention transformer),以及7个拼接方法,包括APAP(as projective as possible)、SPW(single-perspective warps)、WPIS(wide parallax image stitching)、SLAS(seam-guided local alignment and stitching)、VFIS(view-free image stitching)、RSFI(reconstructing stitched features to images)和UDIS++(unsupervised deep image stitching)组成的21种融合—拼接策略进行了定性和定量的性能对比。在拼接性能上,本文方法实现了准确的跨视角场景对齐,平均角点误差降低了53%,避免了鬼影的出现;在多模态互补信息整合方面,本文方法能自适应兼顾红外图像的结构信息以及可见光图像的丰富纹理细节,信息熵较DATFuse-UDIS++策略提升了24.6%。
结论
2
本文方法在结合了红外和可见光图像成像互补优势的基础上,通过多尺度递归估计实现了更加准确的大视角场景生成;与常规可见光图像拼接相比鲁棒性更强。
Objective
2
Image stitching, a cornerstone in the field of computer vision, is dedicated to assembling a comprehensive field-of-view image by merging visible data captured from multiple vantage points within a specific scene. This fusion enhances scene perception and facilitates advanced processing. The current state-of-the-art in image stitching primarily hinges on the detection of feature points within the scene, necessitating their dense and uniform distribution throughout the image. However, these approaches encounter significant challenges in outdoor environments or when applied to military equipment, where adverse weather conditions such as rain, haze, and low light can severely degrade the quality of visible images. This degradation impedes the extraction of feature points, a critical step in the stitching process. Furthermore, factors such as camouflage and occlusion can lead to data loss, disrupting the distribution of feature points and thus compromising the quality of the stitched image. These limitations often manifest as ghosting effects, undermining the effectiveness of the stitching and its robustness in practical applications. In this challenging context, infrared sensors, which detect thermal radiation to image scenes, emerge as a robust alternative. They excel in highlighting targets even under unfavorable conditions, mitigating the impact of environmental and human factors. This capability makes them highly valuable in military surveillance applications. However, a significant drawback of thermal imaging is its inability to capture the rich texture details that are abundant in visible images. These details are crucial for an accurate and comprehensive perception of the scene.
Method
2
This paper proposes a groundbreaking image stitching algorithm to overcome the limitations inherent in conventional visible image stitching and to extend the applicability of stitching technology across various environments. This algorithm is based on the fusion of features from multi-modality images, specifically, infrared and visible images. By exploiting the complementary characteristics of infrared and visible data, our approach integrates the precise structural features of infrared images with the rich, detailed attributes of visible images. This integration is crucial for achieving accurate homography matrix estimation for scenes viewed from multiple angles. A distinctive aspect of our method is the incorporation of a learnable feature pyramid structure. This structure is instrumental in estimating sparse offsets in a gradual, coarse-to-fine manner, thus deriving the deformation matrix through a non-parametric direct linear transformation. An innovative aspect of our approach is the fusion of stitched infrared and visible data to enrich the perceptual information of the generated scene. This fusion process entails mining deep features of the scene for contextual semantic information while also utilizing shallow features to address the deficiencies in upsampled data. This strategy aims to produce more accurate and reliable fused results.
Result
2
We selected a real-world dataset comprising 530 pairs of stitchable multi-modal images and a synthetic dataset containing 200 pairs of data as test datasets. It compared the qualitative and quantitative performance of 21 fusion stitching strategies, incorporating three of the latest fusion methods — namely, residual fusion network, recurrent correction network, and dual-attention transformer (DATFuse) — and seven stitching methods — namely, as projective as possible, single-perspective warps, wide parallax image stitching, seam-guided local alignment and stitching, view-free image stitching, reconstructing stitched features to images, and unsupervised deep image stitching (UDIS++). In terms of stitching performance, our method achieved accurate cross-view scene alignment, with an average corner error reduced by 53%, preventing ghosting and abnormal distortion, even outperforming existing feature-point-based stitching algorithms in challenging large-baseline scenarios. With regard to the integration of multi-modal complementary information, our method adaptively balanced the robust imaging capability of infrared images to highlight structural information and the rich texture details of visible images, resulting in an information entropy increase of 24.6% compared with the DATFuse-UDIS++ strategy, demonstrating significant advantages.
Conclusion
2
The proposed infrared and visible-based image stitching method marks a significant advancement in the field of computer vision. It effectively addresses the limitations of traditional image stitching methods, particularly under adverse environmental conditions. Moreover, it broadens the scope of stitching technology, making it more versatile and applicable in diverse settings. The combination of infrared and visible imagery in this algorithm could revolutionize scene perception and processing, especially in military and outdoor applications where accuracy, detail, and robustness are of utmost importance. Furthermore, the algorithm’s ability to fuse different types of data opens up new avenues for research and application. It suggests potential uses in other fields such as environmental monitoring, search-and-rescue operations, and even in artistic and creative domains where novel visual representations are sought. The fusion technique employed in our algorithm not only enhances the visual quality of the stitched images but also adds a layer of information that could be vital in critical applications such as surveillance and reconnaissance. It effectively addresses the key challenges of traditional image stitching, particularly under adverse environmental conditions, and demonstrates superior performance over existing methods. This advancement not only enhances our ability to perceive and process scenes more effectively but also paves the way for future innovations in image processing and analysis.
多模态图像融合图像拼接卷积神经网络(CNN)红外—可见光图像多尺度金字塔
multi-modality image fusionimage stitchingconvolutional neural network(CNN)infrared and visible imagesmulti-scale pyramid
Chang C H, Sato Y and Chuang Y Y. 2014. Shape-preserving half-projective warps for image stitching//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 3254-3261 [DOI: 10.1109/CVPR.2014.422http://dx.doi.org/10.1109/CVPR.2014.422]
Cui G M, Feng H J, Xu Z H, Li Q and Chen Y T. 2015. Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Optics Communications, 341: 199-209 [DOI: 10.1016/j.optcom.2014.12.032http://dx.doi.org/10.1016/j.optcom.2014.12.032]
DeTone D, Malisiewicz T and Rabinovich A. 2016. Deep image homography estimation [EB/OL]. [2023-12-27]. https://arxiv.org/pdf/1606.03798.pdfhttps://arxiv.org/pdf/1606.03798.pdf
Eskicioglu A M and Fisher P S. 1995. Image quality measures and their performance. IEEE Transactions on Communications, 43(12): 2959-2965 [DOI: 10.1109/26.477498http://dx.doi.org/10.1109/26.477498]
Gao J H, Kim S J and Brown M S. 2011. Constructing image panoramas using dual-homography warping//Proceedings of the CVPR 2011. Colorado Springs, USA: IEEE: 49-56 [DOI: 10.1109/CVPR.2011.5995433http://dx.doi.org/10.1109/CVPR.2011.5995433]
Gao J H, Li Y, Chin T J and Brown M S. 2013. Seam-driven image stitching//Eurographics (Short Papers). Girona, Spain: The Eurographics Association: 45-48
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hoang V D, Tran D P, Nhu N G, Pham T A and Pham V H. 2020. Deep feature extraction for panoramic image stitching//Proceedings of the 12th Asian Conference on Intelligent Information and Database Systems:. Phuket, Thailand: Springer: 141-151 [DOI: 10.1007/978-3-030-42058-1_12http://dx.doi.org/10.1007/978-3-030-42058-1_12]
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7132-7141 [DOI: 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745]
Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2261-2269 [DOI: 10.1109/CVPR.2017.243http://dx.doi.org/10.1109/CVPR.2017.243]
Huang Z B, Liu J Y, Fan X, Liu R S, Zhong W and Luo Z X. 2022. ReCoNet: recurrent correction network for fast and efficient multi-modality image fusion//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer: 539-555 [DOI: 10.1007/978-3-031-19797-0_31http://dx.doi.org/10.1007/978-3-031-19797-0_31]
Jia Q, Li Z J, Fan X, Zhao H T, Teng S Y, Ye X C and Latecki L J. 2021. Leveraging line-point consistence to preserve structures for wide parallax image Stitching//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 12181-12190 [DOI: 10.1109/CVPR46437.2021.01201http://dx.doi.org/10.1109/CVPR46437.2021.01201]
Lai W S, Gallo O, Gu J W, Sun D Q, Yang M H and Kautz J. 2019. Video stitching for linear camera arrays [EB/OL]. [2023-12-27]. https://arxiv.org/pdf/1907.13622.pdfhttps://arxiv.org/pdf/1907.13622.pdf
Lee K Y and Sim J Y. 2020. Warping residual based image stitching for large parallax//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8198-8206 [DOI: 10.1109/CVPR42600.2020.00822http://dx.doi.org/10.1109/CVPR42600.2020.00822]
Leng J X, Mo M J C, Zhou Y H, Ye Y M, Gao C Q and Gao X B. 2023. Recent advances in drone-view object detection. Journal of Image And Graphics, 28(9): 2563-2586
冷佳旭, 莫梦竟成, 周应华, 叶永明, 高陈强, 高新波. 2023. 无人机视角下的目标检测研究进展. 中国图象图形学报, 28(9): 2563-2586 [DOI: 10.11834/Jig.220836http://dx.doi.org/10.11834/Jig.220836]
Li H, Wu X J and Kittler J. 2021. RFN-nest: an end-to-end residual fusion network for infrared and visible images. Information Fusion, 73: 72-86 [DOI: 10.1016/j.inffus.2021.02.023http://dx.doi.org/10.1016/j.inffus.2021.02.023]
Li X Y, Ye Z H, Wei S K, Chen Z, Chen X T, Tian Y H, Dang J W, Fu S J and Zhao Y. 2023. 3D object detection for autonomous driving from image: a survey——benchmarks, constraints and error analysis. Journal of Image And Graphics, 28(6): 1709-1740
李熙莹, 叶芝桧, 韦世奎, 陈泽, 陈小彤, 田永鸿, 党建武, 付树军, 赵耀. 2023. 基于图像的自动驾驶3D目标检测综述——基准、制约因素和误差分析. 中国图象图形学报, 28(6): 1709-1740 [DOI: 10.11834/Jig.230036http://dx.doi.org/10.11834/Jig.230036]
Liao T L and Li N. 2020. Single-perspective warps in natural image stitching. IEEE Transactions on Image Processing, 29: 724-735 [DOI: 10.1109/TIP.2019.2934344http://dx.doi.org/10.1109/TIP.2019.2934344]
Liao T L, Zhao C Y, Li L and Cao H L. 2023. Seam-guided local alignment and stitching for large parallax images[EB/OL]. [2023-12-27]. https://arxiv.org/pdf/2311.18564.pdfhttps://arxiv.org/pdf/2311.18564.pdf
Lin C C, Pankanti S U, Ramamurthy K N and Aravkin A Y. 2015. Adaptive as-natural-as-possible image stitching//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1155-1163 [DOI: 10.1109/CVPR.2015.7298719http://dx.doi.org/10.1109/CVPR.2015.7298719]
Lin K M, Jiang N J, Cheong L F, Do M and Lu J B. 2016. SEAGULL: seam-guided local alignment for parallax-tolerant image stitching//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 370-385 [DOI: 10.1007/978-3-319-46487-9_23http://dx.doi.org/10.1007/978-3-319-46487-9_23]
Lin W Y, Liu S Y, Matsushita Y, Ng T T and Cheong L F. 2011. Smoothly varying affine stitching//Proceedings of the CVPR 2011. Colorado Springs, USA: IEEE: 345-352 [DOI: 10.1109/CVPR.2011.5995314http://dx.doi.org/10.1109/CVPR.2011.5995314]
Lou Z Y and Gevers T. 2014. Image alignment by piecewise planar region matching. IEEE Transactions on Multimedia, 16(7): 2052-2061 [DOI: 10.1109/TMM.2014.2346476http://dx.doi.org/10.1109/TMM.2014.2346476]
Mittal A, Moorthy A K and Bovik A C. 2012. No-reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing, 21(12): 4695-4708 [DOI: 10.1109/TIP.2012.2214050http://dx.doi.org/10.1109/TIP.2012.2214050]
Nie L, Lin C Y, Liao K, Liu M Q and Zhao Y. 2020. A view-free image stitching network based on global homography. Journal of Visual Communication and Image Representation, 73: #102950 [DOI: 10.1016/j.jvcir.2020.102950http://dx.doi.org/10.1016/j.jvcir.2020.102950]
Nie L, Lin C Y, Liao K, Liu S C and Zhao Y. 2021. Unsupervised deep image stitching: reconstructing stitched features to images. IEEE Transactions on Image Processing, 30: 6184-6197 [DOI: 10.1109/TIP.2021.3092828http://dx.doi.org/10.1109/TIP.2021.3092828]
Nie L, Lin C Y, Liao K, Liu S C and Zhao Y. 2023. Parallax-tolerant unsupervised deep image stitching [EB/OL]. [2023-12-27]. https://arxiv.org/pdf/2302.08207.pdfhttps://arxiv.org/pdf/2302.08207.pdf
Roberts J W, Van Aardt J A and Ahmed F B. 2008. Assessment of image fusion procedures using entropy, image quality, and multispectral classification. Journal of Applied Remote Sensing, 2(1): #023522 [DOI: 10.1117/1.2945910http://dx.doi.org/10.1117/1.2945910]
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Shen C W, Ji X Y and Miao C L. 2019. Real-time image stitching with convolutional neural networks//Proceedings of 2019 IEEE International Conference on Real-time Computing and Robotics. Irkutsk, Russia: IEEE: 192-197 [DOI: 10.1109/RCAR47638.2019.9044010http://dx.doi.org/10.1109/RCAR47638.2019.9044010]
Shi Z F, Li H, Cao Q J, Ren H Z and Fan B Y. 2020. An image mosaic method based on convolutional neural network semantic features extraction. Journal of Signal Processing Systems, 92(4): 435-444 [DOI: 10.1007/s11265-019-01477-2http://dx.doi.org/10.1007/s11265-019-01477-2]
Shi Z H, Wu C W, Li C J, You Z Z, Wang Q and Ma C C. 2023. Object detection techniques based on deep learning for aerial remote sensing images: a survey. Journal of Image And Graphics, 28(9): 2616-2643
石争浩, 仵晨伟, 李成建, 尤珍臻, 王泉, 马城城. 2023. 航空遥感图像深度学习目标检测技术研究进展. 中国图象图形学报, 28(9): 2616-2643 [DOI: 10.11834/jig.221085http://dx.doi.org/10.11834/jig.221085]
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2023-12-27]. https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf
Tang W, He F Z, Liu Y, Duan Y S and Si T Z. 2023. DATFuse: infrared and visible image fusion via dual attention transformer. IEEE Transactions on Circuits and Systems for Video Technology, 33(7): 3159-3172 [DOI: 10.1109/TCSVT.2023.3234340http://dx.doi.org/10.1109/TCSVT.2023.3234340]
Xu H, Ma J Y, Le Z L, Jiang J J and Guo X J. 2020. FusionDN: a unified densely connected network for image fusion//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 12484-12491 [DOI: 10.1609/aaai.v34i07.6936http://dx.doi.org/10.1609/aaai.v34i07.6936]
Zaragoza J, Chin T J, Brown M S and Suter D. 2013. As-projective-as-possible image stitching with moving DLT//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 2339-2346 [DOI: 10.1109/CVPR.2013.303http://dx.doi.org/10.1109/CVPR.2013.303]
Zhang F and Liu F. 2014. Parallax-tolerant image stitching//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 3262-3269 [DOI: 10.1109/CVPR.2014.423http://dx.doi.org/10.1109/CVPR.2014.423]
相关作者
相关机构