面向图像复原的非因果选择性状态空间模型
A Non-Causal Selective State Space Model for Image Restoration
- 2024年 页码:1-14
网络出版日期: 2024-12-30
DOI: 10.11834/jig.240517
移动端阅览
浏览全部资源
扫码关注微信
网络出版日期: 2024-12-30 ,
移动端阅览
肖杰,范子豪,李东等.面向图像复原的非因果选择性状态空间模型[J].中国图象图形学报,
Xiao Jie,Fan Zihao,Li Dong,et al.A Non-Causal Selective State Space Model for Image Restoration[J].Journal of Image and Graphics,
目的
2
图像复原是计算机视觉领域的经典研究问题。选择性状态空间模型(Selective State Space Models, SSMs)因其高效的序列建模能力,被广泛应用于各类图像复原任务。另一方面,非局部图像块之间存在依赖关系,能够辅助提升复原性能。然而,传统SSMs采用确定性的令牌(Token)扫描方式,仅能提取令牌序列的单向依赖关系。此时,令牌间的关系建模因在序列中的先后顺序受到因果性制约,这与图像块之间的非因果相互关系形成冲突,限制了复原性能的进一步提升。针对此问题,提出一种面向图像复原的非因果选择性状态空间模型,旨在赋予SSMs建模令牌之间非因果依赖关系的能力。
方法
2
为解决SSMs在因果性建模与图像内容非因果关系之间的矛盾,提出了随机扫描策略,突破了传统扫描方式在因果性和空间限制上的局限,实现了令牌序列之间的非因果建模。具体而言,构建了随机重排和逆重排函数,实现了非固定次序下的令牌扫描,有效建模了不同令牌之间的非因果依赖关系。此外,针对图像退化干扰存在空间尺度变化和形态结构复杂的特点,融合多尺度先验构建了具有局部与全局信息互补性的非因果Mamba模型(Non-Causal Mamba, NCMamba),实现了对于各类图像复原任务的有效适配。
结果
2
实验分别在图像去噪、去模糊和去阴影任务上进行,验证了所提非因果建模和局部-全局互补策略的有效性。例如,与现有方法相比,所提模型在图像去阴影数据集SRD上的峰值信噪比提升了0.86 dB。
结论
2
面向图像复原任务,构建了非因果选择性状态空间模型,建模了令牌之间的非因果依赖关系,实现了局部与全局信息的有效互补,显著提升了复原性能。实验结果表明,所提方法在主客观评价指标上均取得优异性能,为图像复原领域提供了新的解决方案。
Objective
2
Image restoration plays a crucial role in enhancing the quality of degraded images and boosting the efficacy of downstream applications such as image segmentation, object detection, and autonomous driving. This problem centers on reconstructing a high-quality image from its compromised version, tackling various distortions like noise, blur, and compression artifacts. In the era of deep learning, the effectiveness of image restoration techniques has experienced remarkable enhancements, largely due to the development of sophisticated models including Convolutional Neural Networks (CNNs), graph models, and Vision Transformers tailored for complex image degradation scenarios. A recently introduced selective state space model, known as Mamba, has propelled significant progress across domains like natural language processing and computer vision. Mamba distinguishes itself with its robust expressive modeling capabilities, enabling it to adeptly capture and depict intricate data patterns. Moreover, Mamba offers enhanced computational efficiency and a reduced memory footprint, surpassing previous models. To utilize Mamba for image processing, one must extract tokens from images. Following token acquisition, Mamba models the interactions and dependencies along the token sequence. Mamba leverages advanced state space modeling to decode an image's structure and semantics by analyzing token interactions. In image restoration, tokens represent image patches, and their dependencies are typically considered non-causal, allowing interactions regardless of spatial distance. The existing deterministic token scanning in selective state space models, which imposes a causal relationship and one-way interactions, conflicts with this non-causal approach. This mismatch may constrain the potential of these models in image restoration, affecting their performance. To bridge this gap, we introduce NCMamba, a novel model that reconciles the causal bias of selective state space models with the non-causal nature of image patch interactions. Our research validates NCMamba's effectiveness across diverse image restoration challenges.
Method
2
To mitigate the contradiction between the inherent causal relationships of the selective state space model and the non-causal relationships among spatial image patches, we introduce a stochastic scanning strategy for tokens. This innovative method deviates from traditional deterministic scanning techniques, effectively eliminating the intrinsic causal constraints. This approach facilitates a more comprehensive and nuanced modeling of the spatial relationships within image patches, significantly boosting the model's capability to perform high-quality image restoration. The strategy employs a random shuffle function to disrupt the linear arrangement of tokens, ensuring that each token pair's interaction likelihood is equalized, thereby enabling the model to effectively capture non-causal dependencies. An inverse shuffle function complements this approach by reordering the tokens to their original sequence, aiding in precise image reconstruction. Specifically, the random shuffle function stochastically permutes the token sequence, eradicating the concept of “distance” between tokens and treating all tokens equally post-shuffle. The relationship between every token pair then has the same probability of being processed directly by the selective state space model. Following this, the inverse shuffle function restores the shuffled tokens to their original order, facilitating the accurate reconstitution of the image's data structure. Combining multi-scale, local relationship, and global relationship priors, our proposed model, dubbed NCMamba (Non-Causal Mamba), integrates local-global complementary modeling within a UNet structure, showcasing general modeling capability for diverse image restoration tasks.
Result
2
NCMamba was tested across multiple datasets, focusing on image denoising, deblurring, and shadow removal. The experiments demonstrate that NCMamba consistently outperforms established models like CNNs and Vision Transformers. Notably, on the SRD dataset for image shadow removal, NCMamba achieved a PSNR improvement of 0.86dB over the ShadowDiffusion method. Additionally, ablation studies further validated the effectiveness of our non-causal modeling and local-global complementary strategies.
Conclusion
2
This study introduces NCMamba, a tailored selective state space model designed for image restoration. Our extensive testing confirms its superiority over traditional models, establishing it as a robust solution for complex image restoration tasks. The empirical evidence underscores NCMamba's capability to deliver significant improvements in image quality under various conditions.
图像复原状态选择模型非因果建模多尺度建模图像处理
Image restorationSelective state space modelNon-causal modelingMulti-scale modelingImage processing
Anwar S, and Barnes N. 2019. Real Image Denoising With Feature Attention//Proceedings of IEEE International Conference on Computer Vision, 3155–3164 [DOI:10.1109/ICCV.2019.00325http://dx.doi.org/10.1109/ICCV.2019.00325]
Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, Ma S, Xu C, Xu C and Gao W. 2021. Pre-Trained Image Processing Transformer//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 12294–12305 [DOI:10.1109/CVPR46437.2021.01212http://dx.doi.org/10.1109/CVPR46437.2021.01212]
Chen J B, Xiong B S, Kuang F, Zhang Z Z. 2023. Motion deblurring based on deep feature fusion attention and double-scale. Journal of Image and Graphics, 28(12):3731-3743
陈加保, 熊邦书, 况发, 章照中. 2023. 深度特征融合注意力与双尺度的运动去模糊. 中国图象图形学报, 28(12):3731-3743[DOI:10.11834/jig.220931http://dx.doi.org/10.11834/jig.220931]
Chen T, Ye Z, Tan Z, Gong T, Wu Y, Chu Q, Liu B, Yu N and Ye J. 2024. MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection [EB/OL].[2024-08-07]. https://arxiv.org/pdf/1910.08854https://arxiv.org/pdf/1910.08854
Cun X, Pun C-M and Shi C. 2020. Towards Ghost-Free Shadow Removal via Dual Hierarchical Aggregation Network and Shadow Matting GAN//Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 10680–10687 [DOI:10.1609/aaai.v34i07.6695http://dx.doi.org/10.1609/aaai.v34i07.6695]
Dabov K., Foi A., Katkovnik V. and Egiazarian K. 2007. Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering. IEEE Transactions on Image Processing, 16(8), 2080–2095.
Dong C, Loy CC, He K and Tang X. 2016. Image Super-Resolution Using Deep Convolutional Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 295–307 [DOI:10.1109/TPAMI.2015.2439281http://dx.doi.org/10.1109/TPAMI.2015.2439281]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2020. An Image is Worth16x16Words: Transformers for Image Recognition at Scale// Proceedings of the International Conference on Learning Representations.
Fu L, Zhou C, Guo Q, Juefei-Xu F, Yu H, Feng W, Liu Y and Wang S. 2021. Auto-Exposure Fusion for Single-Image Shadow Removal[EB/OL]. [2024-08-07]. https://arxiv.org/pdf/2103.01255https://arxiv.org/pdf/2103.01255
Fu X, Huang J, Ding X, Liao Y and Paisley J. 2017. Clearing the Skies: A Deep Network Architecture for Single-Image Rain Removal. IEEE Transactions on Image Processing, 26(6), 2944–2956. [DOI:10.1109/TIP.2017.2691802http://dx.doi.org/10.1109/TIP.2017.2691802]
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative Adversarial Networks[EB/OL]. [2024-08-07]. https://arxiv.org/pdf/1406.2661https://arxiv.org/pdf/1406.2661
Gu A, and Dao T. 2024. Mamba: Linear-Time Sequence Modeling with Selective State Spaces[EB/OL]. [2024-08-07]. https://arxiv.org/pdf/2312.00752https://arxiv.org/pdf/2312.00752
Gu A, Goel K and Ré C. 2022. Efficiently Modeling Long Sequences with Structured State Spaces[EB/OL]. [2024-08-07]. https://arxiv.org/pdf/2111.00396https://arxiv.org/pdf/2111.00396
Guo H, Li J, Dai T, Ouyang Z, Ren X and Xia S-T. 2024. MambaIR: A Simple Baseline for Image Restoration with State-Space Model[EB/OL]. [2024-08-07]. https://arxiv.org/pdf/2402.15648https://arxiv.org/pdf/2402.15648
Guo L, Huang S, Liu D, Cheng H and Wen B. 2023. ShadowFormer: Global Context Helps Image Shadow Removal[EB/OL]. [2024-08-07]. https://arxiv.org/pdf/2302.01650https://arxiv.org/pdf/2302.01650
Guo L, Wang C, Yang W, Huang S, Wang Y, Pfister H and Wen B. 2023. ShadowDiffusion: When Degradation Prior Meets Diffusion Model for Shadow Removal//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 14049–14058[DOI:10.1109/CVPR52729.2023.01350http://dx.doi.org/10.1109/CVPR52729.2023.01350]
Guo R, Dai Q and Hoiem D. 2013. Paired Regions for Shadow Detection and Removal. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2956–2967 [DOI:10.1109/TPAMI.2012.214http://dx.doi.org/10.1109/TPAMI.2012.214]
Ho J, Jain A and Abbeel P. 2020. Denoising Diffusion Probabilistic Models[EB/OL]. [2024-08-07]. https://arxiv.org/pdf/2006.11239https://arxiv.org/pdf/2006.11239
Hu X, Fu C-W, Zhu L, Qin J and Heng P-A. 2020. Direction-Aware Spatial Context Features for Shadow Detection and Removal. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(11), 2795–2808 [DOI:10.1109/TPAMI.2019.2919616http://dx.doi.org/10.1109/TPAMI.2019.2919616]
Hua X, Shu T, Li M X, Shi Y and Hong H Y. 2024. Nonlocal feature representation-embedded blurred image restoration. Journal of Image and Graphics, 29(10): 3033-3046
华夏, 舒婷, 李明欣, 时愈, 洪汉玉. 2024. 融合非局部特征表示的模糊图像复原. 中国图象图形学报,29(10):3033-3046[DOI:10.11834/jig.230735http://dx.doi.org/10.11834/jig.230735]
Jin Y, Sharma A and Tan RT. 2021. DC-ShadowNet: Single-Image Hard and Soft Shadow Removal Using Unsupervised Domain-Classifier Guided Network//Proceedings of the IEEE International Conference on Computer Vision, 5007–5016 [DOI:10.1109/ICCV48922.2021.00498http://dx.doi.org/10.1109/ICCV48922.2021.00498]
Kawar B, Elad M, Ermon S and Song J. 2022. Denoising Diffusion Restoration Models//Proceedings of Advances in Neural Information Processing Systems, 23593--23606.
Kupyn O, Budzan V, Mykhailych M, Mishkin D and Matas J. 2018. DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8183–8192 [DOI:10.1109/CVPR.2018.00854http://dx.doi.org/10.1109/CVPR.2018.00854]
Lan Z, Yan C P, Li H, and Zheng Y D. 2023. HDA-GAN: hybrid dual attention generative adversarial network for image inpainting. Journal of Image and Graphics, 28(11):3440-3452
兰治, 严彩萍, 李红, 郑雅丹. 2023. 混合双注意力机制生成对抗网络的图像修复模型. 中国图象图形学报, 28(11):3440-3452 [DOI:10.11834/jig.220919http://dx.doi.org/10.11834/jig.220919]
LeCun Y, Bengio Y and Hinton G. 2015. Deep learning. Nature, 521(7553), 436–444[DOI:10.1038/nature14539http://dx.doi.org/10.1038/nature14539]
Lehtinen J, Munkberg J, Hasselgren J, Laine S, Karras T, Aittala M and Aila T. 2018. Noise2Noise: Learning Image Restoration without Clean Data//Proceedings of the 35th International Conference on Machine Learning, 2965–2974.
Li X, Guo Q, Abdelfattah R, Lin D, Feng W, Tsang I and Wang S. 2023. Leveraging Inpainting for Single-Image Shadow Removal [EB/OL]. [2024-08-07]. https://arxiv.org/pdf/2302.05361https://arxiv.org/pdf/2302.05361
Li Y, Fu X and Zha Z-J. 2021. Cross-Patch Graph Convolutional Network for Image Denoising//Proceedings of the IEEE International Conference on Computer Vision, 4631–4640 [DOI:10.1109/ICCV48922.2021.00461http://dx.doi.org/10.1109/ICCV48922.2021.00461]
Liang J, Cao J, Sun G, Zhang K, Van Gool L and Timofte R. 2021. SwinIR: Image Restoration Using Swin Transformer//Proceedings of the IEEE International Conference on Computer Vision Workshops, 1833–1844 [DOI:10.1109/ICCVW54120.2021.00210http://dx.doi.org/10.1109/ICCVW54120.2021.00210]
Liu Y X, Zhao Q J, Pan F, Gao D G, and Pu B D Z. 2023. Structure prior guided text image inpainting model. Journal of Image and Graphics, 28(12):3699-3712
刘雨轩, 赵启军, 潘帆, 高定国, 普布旦增. 2023. 结构先验指导的文本图像修复模型. 中国图象图形学报, 28(12):3699-3712[DOI:10.11834/jig.220960http://dx.doi.org/10.11834/jig.220960]
Mehta H, Gupta A, Cutkosky A and Neyshabur B. 2022. Long Range Language Modeling via Gated State Spaces[EB/OL]. [2024-08-07]. https://arxiv.org/pdf/2206.13947https://arxiv.org/pdf/2206.13947
Nah S, Kim TH and Lee KM. 2017. Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 257–265 [DOI:10.1109/CVPR.2017.35http://dx.doi.org/10.1109/CVPR.2017.35]
Pan J, Sun D, Pfister H and Yang M-H. 2016. Blind Image Deblurring Using Dark Channel Prior//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1628–1636[DOI:10.1109/CVPR.2016.180http://dx.doi.org/10.1109/CVPR.2016.180]
Pan X, Zhan X, Dai B, Lin D, Loy CC and Luo P. 2020. Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation[EB/OL]. [2024-08-07]. https://arxiv.org/pdf/2003.13659https://arxiv.org/pdf/2003.13659
Qu L, Tian J, He S, Tang Y and Lau RWH. 2017. DeshadowNet: A Multi-context Embedding Deep Network for Shadow Removal//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2308–2316 [DOI:10.1109/CVPR.2017.248http://dx.doi.org/10.1109/CVPR.2017.248]
Tu Z, Talebi H, Zhang H, Yang F, Milanfar P, Bovik A and Li Y. 2022. MAXIM: Multi-Axis MLP for Image Processing//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5759–5770 [DOI:10.1109/CVPR52688.2022.00568http://dx.doi.org/10.1109/CVPR52688.2022.00568]
Venkatakrishnan SV, Bouman CA and Wohlberg B. 2013. Plug-and-Play priors for model based reconstruction//Proceedings of the IEEE Global Conference on Signal and Information Processing, 945–948 [DOI:10.1109/GlobalSIP.2013.6737048http://dx.doi.org/10.1109/GlobalSIP.2013.6737048]
Wang W J, Yang W H, Fang Y M, Huang H, and Liu J Y. 2024. Visual perception and understanding in degraded scenarios. Journal of Image and Graphics, 29(06):1667-1684
汪文靖, 杨文瀚, 方玉明, 黄华, 刘家瑛. 2024. 恶劣场景下视觉感知与理解综述. 中国图象图形学报, 29(06):1667-1684[DOI:10.11834/jig.240041http://dx.doi.org/10.11834/jig.240041]
Wang Y, Yu J and Zhang J. 2023. Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model//Proceedings of the Eleventh International Conference on Learning Representations.
Wang Z, Cun X, Bao J, Zhou W, Liu J and Li H. 2022. Uformer: A General U-Shaped Transformer for Image Restoration//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 17662-17672[DOI:10.1109/CVPR52688.2022.01716http://dx.doi.org/10.1109/CVPR52688.2022.01716]
Xiao J, Fu X, Zhu Y, Li D, Huang J, Zhu K and Zha Z-J. 2024. HomoFormer: Homogenized Transformer for Image Shadow Removal//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 25617–25626.
Xiong W, Xiong C Y, Gao Z R, Chen W Q, Zheng R H, and Tian J W. 2023. Image super-resolution with channel-attention-embedded Transformer. Journal of Image and Graphics, 28(12):3744-3757
熊巍, 熊承义, 高志荣, 陈文旗, 郑瑞华, 田金文. 2023. 通道注意力嵌入的Transformer图像超分辨率重构. 中国图象图形学报, 28(12):3744-3757[DOI:10.11834/jig.221033http://dx.doi.org/10.11834/jig.221033]
Xu L, Zheng S and Jia J. 2013. Unnatural L0 Sparse Representation for Natural Image Deblurring//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1107–1114[DOI:10.1109/CVPR.2013.147http://dx.doi.org/10.1109/CVPR.2013.147]
Yang W, Tan RT, Feng J, Liu J, Guo Z and Yan S. 2017. Deep Joint Rain Detection and Removal from a Single Image//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1685–1694[DOI:10.1109/CVPR.2017.183http://dx.doi.org/10.1109/CVPR.2017.183]
Yue Z, Yong H, Zhao Q, Zhang L and Meng D. 2019. Variational Denoising Network: Toward Blind Noise Modeling and Removal//Proceedings of Advances in Neural Information Processing Systems.
Yue Z, Zhao Q, Zhang L and Meng D. 2020. Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation [EB/OL]. [2024-08-07]. https://arxiv.org/pdf/2007.05946https://arxiv.org/pdf/2007.05946
Zamir SW, Arora A, Khan S, Hayat M, Khan FS and Yang M-H. 2022. Restormer: Efficient Transformer for High-Resolution Image Restoration//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5718–5729[DOI:10.1109/CVPR52688.2022.00564http://dx.doi.org/10.1109/CVPR52688.2022.00564]
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H and Shao L. 2020a. CycleISP: Real Image Restoration via Improved Data Synthesis//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2693–2702[DOI:10.1109/CVPR42600.2020.00277http://dx.doi.org/10.1109/CVPR42600.2020.00277]
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H and Shao L. 2020b. Learning Enriched Features for Real Image Restoration and Enhancement//Proceedings of the European Conference on Computer Vision, 492--511 [DOI:10.1007/978-3-030-58595-2_30http://dx.doi.org/10.1007/978-3-030-58595-2_30]
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H and Shao L. 2021. Multi-Stage Progressive Image Restoration//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 14816–14826 [DOI:10.1109/CVPR46437.2021.01458http://dx.doi.org/10.1109/CVPR46437.2021.01458]
Zhang J, Pan J, Ren J, Song Y, Bao L, Lau RWH and Yang M-H. 2018. Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2521–2529 [DOI:10.1109/CVPR.2018.00267http://dx.doi.org/10.1109/CVPR.2018.00267]
Zhang K, Luo W, Zhong Y, Ma L, Stenger B, Liu W and Li H. 2020. Deblurring by Realistic Blurring//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2734–2743[DOI:10.1109/CVPR42600.2020.00281http://dx.doi.org/10.1109/CVPR42600.2020.00281]
Zhang K, Zuo W, Chen Y, Meng D and Zhang L. 2017. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE Transactions on Image Processing, 26(7), 3142–3155[DOI:10.1109/TIP.2017.2662206http://dx.doi.org/10.1109/TIP.2017.2662206]
Zhao S, Chen H, Zhang X, Xiao P, Bai L and Ouyang W. 2024. RS-Mamba for Large Remote Sensing Image Dense Prediction [EB/OL]. [2024-08-07]. https://arxiv.org/pdf/2404.02668https://arxiv.org/pdf/2404.02668
Zhu L, Liao B, Zhang Q, Wang X, Liu W and Wang X. 2024. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model//Proceedings of the 41st International Conference on Machine Learning, 62429–62442.
Zhu Y, Huang J, Fu X, Zhao F, Sun Q and Zha Z-J. 2022. Bijective Mapping Network for Shadow Removal//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5617–5626 [DOI:10.1109/CVPR52688.2022.00554http://dx.doi.org/10.1109/CVPR52688.2022.00554]
相关作者
相关机构