基于双域特征融合的图像去雾网络

王炜嘉; 陈飞; 刘莞玲; 程航; 王美清

doi:10.11834/jig.240655

浏览量 : 0 下载量: 0 CSCD: 0

PDF
导出
分享
收藏
专辑

基于双域特征融合的图像去雾网络
Image dehazing network based on dual-domain feature fusion
2025年页码：1-15
收稿日期：2024-10-28，

修回日期：2025-04-01，

录用日期：2025-04-08，

网络出版日期：2025-04-09，
DOI： 10.11834/jig.240655
稿件说明：

移动端阅览

王炜嘉, 陈飞, 刘莞玲, 程航, 王美清. 基于双域特征融合的图像去雾网络[J/OL]. 中国图象图形学报, 2025,1-15. DOI： 10.11834/jig.240655.

Wang Weijia, Chen Fei, Liu Wanling, Cheng Hang, Wang Meiqing. Image dehazing network based on dual-domain feature fusion[J/OL]. Journal of image and graphics, 2025, 1-15. DOI： 10.11834/jig.240655.

摘要

目的

图像去雾旨在从有雾图像中恢复潜在的无雾图像。现有方法利用清晰/退化图像对在空间域和频率域的差异进行去雾并取得一定的效果，但是仍存在三个主要问题：空间域特征提取与融合存在局限性、频率域特征融合效果不佳以及未能实现频空双域特征的高效融合。针对这些问题，提出专注于频空双域特征融合的双域特征融合网络（dual-domain feature fusion network，DFFNet）。

方法

首先，设计更适合图像软重建的空间域特征融合模块（spatial-domain feature fusion module，SFFM），采用Transformer风格架构，通过大核注意力机制捕获全局特征并定位有雾区域，像素注意力机制建模局部特征并恢复边缘和细节，共同模拟多头自注意力机制，满足软重建需求。同时，提出频率域特征融合模块（frequency-domain feature fusion module，FFFM）。该模块采用隐式方法处理高频信息，通过多个卷积层增强高频分量，多分支通道注意力实现频率高效融合，并放置于网络瓶颈处实现频空双域特征高效融合。

结果

结合这两种关键模块设计提出的DFFNet在两个基准数据集上展现出超越目前最先进方法的性能表现。DFFNet-L是第一个在室内合成目标测试集（synthetic objective testing set-indoor，SOTS-Indoor）上峰值信噪比（peak signal-to-noise ratio，PSNR）超过43分贝（decibel ，dB）以及第一个在Haze4K数据集上PSNR超过36dB的去雾网络，PSNR分别为43.83dB和36.39dB，分别领先图像去雾领域最先进的方法MixDehazeNet-L 1.21dB和0.45dB。并且DFFNet更加轻量级，参数量仅为MixDehazeNet-L的46.0%，浮点运算次数仅为其67.1%，同时，由于DFFNet的主要模块SFFM和FFFM具有良好的可迁移性和扩展性，这使得它们能够便捷地迁移到其他计算机视觉任务中，为提升模型性能提供新的解决方案。

结论

本文所提出的双域特征融合网络，综合了卷积神经网络模型和Transformer模型的优点，有效解决了双域特征融合存在的问题，取得了卓越的去雾效果。代码发布于

https：//github.com/WWJ0720/DFFNet

https://github.com/WWJ0720/DFFNet

。

Abstract

Objective

Image dehazing is a crucial task in computer vision and image processing， aimed at recovering potential haze-free images from those degraded by atmospheric scattering phenomena such as haze and fog. Image dehazing research holds significant practical importance and application value. It not only enhances the performance and reliability of visual systems under adverse weather conditions， providing technical support for critical applications such as autonomous driving， intelligent surveillance， and remote sensing image analysis， but also serves as a preprocessing step to improve the performance of advanced computer vision tasks including object detection， image classification， and scene understanding. Simultaneously， image dehazing technology has driven the development of atmospheric scattering models and image restoration theory， promoting innovative applications of deep learning in low-level computer vision tasks. It has substantial value for improving the effectiveness of public safety， environmental monitoring， and intelligent transportation systems， serving as a crucial supporting technology for achieving all-weather， all-scenario visual perception capabilities. Existing methods utilize differential features of clear/degraded image pairs in spatial and frequency domains to achieve dehazing with certain effectiveness. However， three major problems remain： limitations in spatial domain feature extraction and fusion， poor performance in frequency domain feature fusion， and inefficient integration of features from both domains. To address these issues， we propose the Dual-domain Feature Fusion Network （DFFNet）， which focuses on the fusion of features from both spatial and frequency domains.

Methods

DFFNet is based on soft reconstruction to recover potential haze-free images and consists of two key modules. First， we design a Spatial-domain Feature Fusion Module （SFFM） more suitable for image soft reconstruction， adopting a Transformer-style architecture that captures global features through large-kernel attention mechanisms to accurately locate hazy regions in images， while modeling local features through pixel attention mechanisms to effectively restore image edges and details. These two attention mechanisms jointly satisfy the dual requirements for global and local features in the image soft reconstruction process， mapping and fusing them through a convolutional feed-forward network. The SFFM maintains a relatively small parameter scale and low computational complexity. Meanwhile， according to the spectral convolution theorem， visual feature processing in the spatial domain and frequency domain is essentially equivalent—convolution operations in the spatial domain are equivalent to multiplication operations in the frequency domain. Therefore， we choose to emphasize high-frequency components through convolution. We propose the Frequency-domain Feature Fusion Module （FFFM）， which adopts an implicit method to process high-frequency information without requiring explicit Fourier transforms. By stacking multiple convolutions as high-pass filters， the module enhances high-frequency components in the input features and enriches their diversity， significantly improving the model's ability to restore high-frequency details in images. Increasing the number of stacked convolutional layers can extract richer high-frequency features， thereby enhancing the quality of the FFFM module's output features. However， considering parameter scale and computational efficiency， we ultimately choose to stack three convolutional layers to extract three different types of high-frequency features. To better capture contextual information while optimizing computational overhead， the FFFM module is only applied between the encoder and decoder at the third scale. The output features of the first and second scale encoders are adaptively fused with the upsampled output features of the second and third scale decoders through SKFusion， respectively. The output features of the first scale decoder， after Patch Unembed， are processed together with the input image through the SR module to recover the potential haze-free image. In DFFNet， the fusion of spatial and frequency domain features occurs in the FFFM module located at the network bottleneck. The features at the bottleneck have already extracted rich multi-scale spatial information through multiple SFFM layers， possessing strong semantic expression capabilities. Simultaneously， as the connection point between the encoder and decoder， the features fused at this position can directly influence the subsequent recovery process， allowing high-frequency information to effectively guide the entire decoding process， thereby better restoring image edges and details.

Results

The DFFNet， combining these two key module designs， demonstrates performance exceeding current state-of-the-art methods on two benchmark datasets. DFFNet-L is the first dehazing network to achieve peak signal-to-noise ratio（PSNR） values exceeding 43dB on the SOTS-Indoor test set and 36dB on the Haze4K dataset， with specific values of 43.83dB and 36.39dB， outperforming the current state-of-the-art image dehazing method MixDehazeNet-L by 1.21dB and 0.45dB respectively. Furthermore， DFFNet is more lightweight， with only 46.0% of the parameters and 67.1% of the floating point operations compared to MixDehazeNet-L. In the visualization experiments， DFFNet-L demonstrates the highest quality of haze-free image restoration. This superior performance is primarily attributed to its large receptive field design， which enables the model to fully capture and utilize global contextual information in images for precise dehazing. This advantage allows DFFNet-L to more accurately identify the spatial distribution characteristics of haze in scenes， thereby achieving more thorough haze removal， more uniform and natural color restoration， and better overall visual effects in the recovered images. Simultaneously， DFFNet-L leverages the frequency domain differential features between clear and degraded image pairs， significantly enhancing its ability to restore high-frequency components of images. This results in sharper edge contours and clearer textural details in the recovered images， improving the accuracy of detail restoration and making the results more closely approximate the haze-free ground truth images. Moreover， the main modules of DFFNet， SFFM and FFFM， possess good transferability and scalability， allowing them to be conveniently transferred to other computer vision tasks， providing new solutions for improving model performance.

Conclusion

This paper proposes a dual-domain feature fusion network that combines the advantages of convolutional neural network （CNN） models

and Transformer models， effectively addressing the existing problems in dual-domain feature fusion and achieving excellent dehazing results. The code is available at

https：//github.com/WWJ0720/DFFNet

https://github.com/WWJ0720/DFFNet

关键词

Keywords

references

Bai J W ， Yuan L ， Xia S T ， Li Z F and Liu W . 2022 . Improving vision transformers by revisiting high-frequency components // Proceedings of 2022 European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 1 - 18 ［ DOI： 10.1007/978-3-031-20053-3_1 http://dx.doi.org/10.1007/978-3-031-20053-3_1 ］

Berman D ， Treibitz T and Avidan S . 2016 . Non-local image dehazing // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， USA ： IEEE： 1674 - 1682 ［ DOI： 10.1109/CVPR.2016.185 http://dx.doi.org/10.1109/CVPR.2016.185 ］

Cai B L ， Xu X M ， Jia K ， Qing C M and Tao D C . 2016 . DehazeNet： an end-to-end system for single image haze removal . IEEE Transactions on Image Processing ， 25 （ 11 ）： 5187 - 5198 ［ DOI： 10.1109/TIP.2016.2598681 http://dx.doi.org/10.1109/TIP.2016.2598681 ］

Chen W T ， Fang H Y ， Hsieh C L ， Tsai C C ， Chen I H ， Ding J J and K S Y . 2021 . All snow removed： single image desnowing algorithm using hierarchical dual-tree complex wavelet representation and contradict channel loss // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada ： IEEE： 4196 - 4205 ［ DOI： 10.1109/ICCV48922.2021.00416 http://dx.doi.org/10.1109/ICCV48922.2021.00416 ］

Chen Z X ， He Z W and Lu Z M . 2024 . DEA-Net： single image dehazing based on detail-enhanced convolution and content-guided attention . IEEE Transactions on Image Processing ， 33 ： 1002 – 1015 ［ DOI： 10.1109/TIP.2024.3354108 http://dx.doi.org/10.1109/TIP.2024.3354108 ］.

Cho S J ， Ji S W ， Hong J P ， Jung S W and Ko S J . 2021 . Rethinking coarse-to-fine approach in single image deblurring // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada ： IEEE： 4641 - 4650 ［ DOI： 10.1109/ICCV48922.2021.00460 http://dx.doi.org/10.1109/ICCV48922.2021.00460 ］

Cui Y N and Knoll A . 2023 . Exploring the potential of channel interactions for image restoration . Knowledge-Based Systems ， 282 ： 111156 ［ DOI： 10.1016/j.knosys.2023.111156 http://dx.doi.org/10.1016/j.knosys.2023.111156 ］

Cui Y N and Knoll A . 2024 . Dual-domain strip attention for image restoration . Neural Networks ， 171 ： 429 – 439 ［ DOI： 10.1016/j.neunet.2023.12.003 http://dx.doi.org/10.1016/j.neunet.2023.12.003 ］

Cui Y N ， Ren W Q and Knoll A . 2024 . Omni-kernel network for image restoration // Proceedings of 2024 AAAI conference on Artificial Intelligence . Washington， USA ： AAAI Press： 1426 - 1434 ［ DOI： 10.1609/aaai.v38i2.27907 http://dx.doi.org/10.1609/aaai.v38i2.27907 ］

Cui Y N ， Ren W Q ， Cao X C and Knoll A . 2023 . Focal network for image restoration // Proceedings of 2023 IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 13001 - 13011 ［ DOI： 10.1109/ICCV51070.2023.01195 http://dx.doi.org/10.1109/ICCV51070.2023.01195 ］

Cui Y N ， Ren W Q ， Cao X C and Knoll A . 2024 . Image restoration via frequency selection . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 46 （ 2 ）： 1093 - 1108 ［ DOI： 10.1109/TPAMI.2023.3330416 http://dx.doi.org/10.1109/TPAMI.2023.3330416 ］

Cui Y N ， Ren W Q ， Yang S N ， Chao X C ， Knoll A . IRNeXt： Rethinking convolutional network design for image restoration // Proceedings of the 40th International Conference on Machine Learning . New York ： ICML： 6545 - 6564 ［ DOI： 10.5555/3618408.3618669 http://dx.doi.org/10.5555/3618408.3618669 ］

Cui Y N ， Tao Y ， Bing Z S ， Ren W Q ， Gao X W ， Cao X C ， Huang K and Knoll A . 2023 . Selective frequency network for image restoration // Proceedings of the 11th International Conference on Learning Representations . Washington DC ： ICLR

Cui Y N ， Tao Y ， Jing L X and Knoll A . 2023 . Strip attention for image restoration // Proceedings of the 32 International Joint Conference on Artificial Intelligence . Macao， China ： IJCAI： 645 - 653 ［ DOI： 10.24963/ijcai.2023/72 http://dx.doi.org/10.24963/ijcai.2023/72 ］

Dai Y M ， Gieseke F ， Oehmcke S ， Wu Y Q and Barnard K . 2021 . Attentional feature fusion // Proceedings of 2021 IEEE/CVF Winter Conference on Applications of Computer Vision . Waikoloa， USA ： IEEE： 3560 - 3569 ［ DOI： 10.1109/WACV48630.2021.00360 http://dx.doi.org/10.1109/WACV48630.2021.00360 ］

Dong H ， Pan J S ， Xiang L ， Hu Z ， Zhang X Y ， Wang F and Yang M H . 2020 . Multi-scale boosted dehazing network with dense feature fusion // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Rattern Recognition . Seattle， USA ： IEEE： 2157 - 2167 ［ DOI： 10.1109/CVPR42600.2020.00223 http://dx.doi.org/10.1109/CVPR42600.2020.00223 ］

Guo C L ， Yan Q X ， Anwar S ， Cong R M ， Ren W Q and Li C Y . 2022 . Image dehazing transformer with transmission-aware 3d position embedding // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . New Orleans， USA ： IEEE： 5812 - 5820 ［ DOI： 10.1109/CVPR52688.2022.00572 http://dx.doi.org/10.1109/CVPR52688.2022.00572 ］

Guo M H ， Lu C Z ， Liu Z N ， Cheng M M and Hu S M . 2023 . Visual attention network . Computational Visual Media ， 9 （ 4 ）： 733 - 752 ［ DOI： 10.1007/s41095-023-0364-2 http://dx.doi.org/10.1007/s41095-023-0364-2 ］

Guo S ， Yong H W ， Zhang X D ， Ma J Q and Zhang L . 2023 . Spatial-frequency attention for image denoising ［EB/OL］.［ 2023-2-27 ］. https://arxiv.org/pdf/2302.13598.pdf https://arxiv.org/pdf/2302.13598.pdf

He K M ， Sun J and Tang X O . 2010 . Single image haze removal using dark channel prior . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 33 （ 12 ）： 2341 - 2353 ［ DOI： 10.1109/TPAMI.2010.168 http://dx.doi.org/10.1109/TPAMI.2010.168 ］

Katznelson Y . 2004 . An introduction to harmonic analysis ［M］. United Kingdom ： Cambridge University Press

Li B Y ， Peng X L ， Wang Z Y ， Xu J Z and Feng D . 2017 . Aod-net： all-in-one dehazing network // Proceedings of 2017 IEEE International Conference on Computer Vision . Venice， Italy ： IEEE： 4770 - 4778 ［ DOI： 10.1109/ICCV.2017.511 http://dx.doi.org/10.1109/ICCV.2017.511 ］

Li B Y ， Ren W Q ， Fu D P ， Tao D C ， Feng D ， Zeng W J and Wang Z Y . 2018 . Benchmarking single-image dehazing and beyond . IEEE Transactions on Image Processing ， 27 （ 11 ）： 5426 - 5440 ［ DOI： 10.1109/TIP.2018.2867951 http://dx.doi.org/10.1109/TIP.2018.2867951 ］

Liu X H ， Ma Y R ， Shi Z H and Chen J . 2019 . GridDehazeNet： attention-based multi-scale network for image dehazing // Proceedings of 2019 IEEE International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 7314 - 7323 ［ DOI： 10.1109/ICCV.2019.00741 http://dx.doi.org/10.1109/ICCV.2019.00741 ］

Loshchilov I and Hutter F . 2018 . Decoupled Weight Decay Regularization // Proceedings of the 6th International Conference on Learning Representations . Washington DC ： ICLR

Lu L P ， Xiong Q ， Xu B R and Chu D F . 2024 . MixDehazeNet： mix structure block for image dehazing network // Proceedings of 2024 International Joint Conference on Neural Networks （IJCNN） . Yokohama， Japan ： IEEE： 1 - 10 ［ DOI： 10.1109/IJCNN60899.2024.10651326 http://dx.doi.org/10.1109/IJCNN60899.2024.10651326 ］

Narasimhan S G and Nayar S K . 2002 . Vision and the atmosphere . International Journal of Computer Vision ， 48 ： 233 - 254 ［ DOI： 10.1023/A：1016328200723 http://dx.doi.org/10.1023/A：1016328200723 ］

Park N and Kim S . 2022 . How do vision transformers work？ // Proceedings of the 10th International Conference on Learning Representations . Washington DC ： ICLR

Qin X ， Wang Z L ， Bai Y C ， Xie X D and Jia H Z . 2020 . FFA-Net： feature fusion attention network for single image dehazing // Proceedings of 2020 AAAI Conference on Artificial Intelligence . Palo Alto ： AAAI Press： 11908 - 11915 ［ DOI： 10.1609/aaai.v34i07.6865 http://dx.doi.org/10.1609/aaai.v34i07.6865 ］

Ronneberger O ， Fischer P and Brox T . 2015 . U-net： convolutional networks for biomedical image segmentation // proceedings of 2015 Medical image computing and computer-assisted intervention–MICCAI 2015： 18th international conference ， Munich， Germany ： Springer： 234 - 241 ［ DOI： 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ］

Song Y D ， He Z Q ， Qian H and Du X . 2023 . Vision transformers for single image dehazing . IEEE Transactions on Image Processing ， 32 ： 1927 - 1941 ［ DOI： 10.1109/TIP.2023.3256763 http://dx.doi.org/10.1109/TIP.2023.3256763 ］

Song Y D ， Zhou Y ， Qian H and Du X . 2022 . Rethinking performance gains in image dehazing networks ［EB/OL］.［ 2022-09-23 ］. https://arxiv.org/pdf/2209.11448.pdf https://arxiv.org/pdf/2209.11448.pdf

Wang W H ， Xie E Z ， Li X ， Fang D P ， Song K T ， Liang D ， Lu T ， Luo P and Shao L . 2022 . Pvt v2： improved baselines with pyramid vision transformer . Computational Visual Media ， 8 （ 3 ）： 415 - 424 ［ DOI： 10.1007/s41095-022-0274-8 http://dx.doi.org/10.1007/s41095-022-0274-8 ］

Wang Z D ， Cun X D ， Bao J M ， Zhou W G ， Liu J Z and Li H Q . 2022 . Uformer： a general u-shaped transformer for image restoration // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . New Orleans， USA ： IEEE： 17683 - 17693 ［ DOI： 10.1109/CVPR52688.2022.01716 http://dx.doi.org/10.1109/CVPR52688.2022.01716 ］

Wu H Y ， Qu Y Y ， Lin S H ， Zhou J ， Qiao R Z ， Zhang Z Z ， Xie Y and Ma L Z . 2021 . Contrastive learning for compact single image dehazing // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）， Nashville， USA ： IEEE： 10551 - 10560 ［ DOI： 10.1109/CVPR46437.2021.01041 http://dx.doi.org/10.1109/CVPR46437.2021.01041 ］

Valanarasu J M J ， Yasarla R and Patel V M . 2022 . TransWeather： transformer-based restoration of images degraded by adverse weather conditions // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . New Orleans， USA ： IEEE： 2353 - 2363 ［ DOI： 10.1109/CVPR52688.2022.00239 http://dx.doi.org/10.1109/CVPR52688.2022.00239 ］

Xing Xiaomin ， Liu Wei . Haze removal for single traffic image ［J］. Journal of image and graphics ， 2016 ， 21 （ 11 ）： 1440

邢晓敏，刘威 . 雾天交通场景中单幅图像去雾［J］. 中国图象图形学报， 2016 ， 21 （ 11 ）： 1440 ［ DOI： 10.11834/jig.20161103 http://dx.doi.org/10.11834/jig.20161103 ］

Zheng Y ， Zhan J H ， He S F ， Dong J Y and Du Y . 2023 . Curricular contrastive regularization for physics-aware single image dehazing // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Vancouver， Canada ： IEEE： 5785 - 5794 ［ DOI： 10.1109/CVPR52729.2023.00560 http://dx.doi.org/10.1109/CVPR52729.2023.00560 ］

Zhu Q S ， Mai J M and Shao L . 2015 . A fast single image haze removal algorithm using color attenuation prior . IEEE Transactions on Image Processing ， 24 （ 11 ）： 3522 - 3533 ［ DOI： 10.1109/TIP.2015.2446191 http://dx.doi.org/10.1109/TIP.2015.2446191 ］

文章被引用时，请邮件提醒。

提交

引导性权重驱动的图表问答重定位关系网络

基于监督注意力的遥感图像定向目标检测

融合注意力及增强感受野的桥隧表面病害深度网络检测

特征重排列注意力机制的双池化残差分类网络

“三维视觉—语言”推理技术的前沿研究与最新趋势