Low-light optical flow estimation with hidden feature supervision using a Siamese network

Xiao Zhaolin; Su Zhan; Zuo Fengyuan; Jin Haiyan

doi:10.11834/jig.230093

Image Understanding and Computer Vision | Views : 0 下载量: 1 CSCD: 0

PDF
Export
Share
Collection
Album

Low-light optical flow estimation with hidden feature supervision using a Siamese network
Vol. 29, Issue 1, Pages: 231-242(2024)
Published： 16 January 2024 ，
DOI： 10.11834/jig.230093
稿件说明：

移动端阅览

肖照林，苏展，左逢源，金海燕. 2024. 隐特征监督的孪生网络弱光光流估计. 中国图象图形学报， 29(01):0231-0242

Xiao Zhaolin， Su Zhan， Zuo Fengyuan， Jin Haiyan. 2024. Low-light optical flow estimation with hidden feature supervision using a Siamese network. Journal of Image and Graphics， 29(01):0231-0242
肖照林，苏展，左逢源，金海燕. 2024. 隐特征监督的孪生网络弱光光流估计. 中国图象图形学报， 29(01):0231-0242 DOI： 10.11834/jig.230093.

Xiao Zhaolin， Su Zhan， Zuo Fengyuan， Jin Haiyan. 2024. Low-light optical flow estimation with hidden feature supervision using a Siamese network. Journal of Image and Graphics， 29(01):0231-0242 DOI： 10.11834/jig.230093.

摘要

目的

弱光照条件下成像存在信噪比低、运动模糊等问题，这对光流估计带来了极大挑战。与现有“先增强—再估计”的光流估计方法不同，为了避免在弱光图像增强阶段损失场景的运动信息，提出一种隐特征监督的弱光光流估计孪生网络学习方法。

方法

首先，该方法采用权重共享的孪生网络提取可映射的弱光光流和正常光照光流特征；进而，计算弱光邻帧图像的K近邻相关性卷表，以解决计算4D全对相关性卷表的高时空复杂度问题；在全局运动聚合模块中引入针对二维运动特征的注意力机制，以降低弱光条件下强噪声、运动模糊及低对比度对光流估计的不利影响。最后，提出隐特征监督的光流估计模块，采用正常光照光流特征监督弱光照光流特征的学习，实现高精度的光流估计。

结果

与3种最新光流估计方法的对比实验表明，在正常光照条件下，本文方法取得了与现有最佳光流估计方法相近的性能。在FCDN（flying chairs dark noise）数据集上，本文方法光流估计性能最优，相较于次优方法端点误差精度提升了0.16；在多亮度光流估计（various brightness optical flow，VBOF）数据集上，本文方法端点误差精度提升了0.08。

结论

本文采用权重共享的双分支孪生网络，实现了对正常光照和弱光照光流特征的准确编码，并采用监督学习方式实现了高精度的弱光照光流估计。实验结果表明，本文方法在弱光光流估计精度及泛化性方面均具有显著优势。本文代码可在

https://github.com/suzhansz/LLCV-net.git

下载。

Abstract

Objective

Optical flow estimation has been widely used in target tracking， video time-domain super-resolution， behavior recognition， scene depth estimation， and other vision applications. However， imaging under low-light conditions can hardly avoid low signal-to-noise ratio and motion blur， making low-light optical flow estimation very challenging. Applying a pre-stage low-light image enhancement can effectively improve the image visual perception， but it may not be helpful for further optical flow estimation. Unlike the “low light enhancement first and optical flow estimation next” strategy， the low-light image enhancement should be considered with the optical flow estimation simultaneously to prevent the loss of scene motion information. The optical flow features are encoded into the latent space， which enables supervised feature learning for paired low-light and normal-light datasets. This paper also reveals the post-task-oriented feature enhancement outperforms the general visual enhancement of low-light images. The main contributions of this paper can be summarized as follows： 1） A dual-branch Siamese network framework is proposed for low-light optical flow estimation. A weight-sharing block is used to establish the correlation of motion features between low-light images and normal-light images. 2） An iterative low-light flow estimation module， which can be supervised using normal-light hidden features， is proposed. Our solution is free of explicit enhancement of low-light images.

Method

This paper proposes a dual-branch Siamese network to encode low-light and the normal-light optical flow features. Then， the encoded features are used to estimate the optical flow in a supervised manner. Our dual-branch feature extractor is constructed using a weight-sharing block， which encodes the motion features. Importantly， our algorithm does not need a pre-stage low-light enhancement， which is usually employed in most existing optical flow estimations. To overcome the high spatial-temporal computational complexity， this paper proposes to compute the K-nearest neighbor correlation volume instead of the 4D all-pair correlation volume. To fuse local and global motion features better， an attention mechanism for the 2D motion feature aggregation is introduced. After the feature extraction， a discriminator is used to distinguish the low-light image features from the normal-light image features. The feature extractor training is completed when the discriminator is incapable to recognize the two. To avoid the explicit enhancement of low-light images， the final optical flow estimation module is composed of a feature enhancement block and a gated recurrent unit （GRU）. In an iterative way， the optical flow is decoded from the enhanced feature in the block. A latent feature supervised loss and an iterative similarity loss are used to keep the convergence of the training stage. In the experiment part， the network is trained on an NVIDIA GeForce RTX 3080Ti GPU. The input images are uniformly cropped to 496 × 368 pixels in spatial resolution. Because the low-light and normal-light image paired datasets are limited， the flying chairs dark noise （FCDN） and the various brightness optical flow （VBOF） datasets are jointly used for the model training.

Result

The proposed algorithm is compared with three state-of-the-art optical flow estimation models on several low-light datasets and normal-light datasets， including FCDN， VBOF， Sintel， and KITTI datasets. Besides the visual comparison， quantitative evaluation with the end-point-error （EPE） metric is conducted. Experimental results show the proposed method achieves a performance comparable with the best available optical flow estimation under normal illumination conditions. The proposed solution improves up to 0.16 in terms of the EPE index on the FCDN dataset compared with the second-best solution under the low-light condition. On the VBOF dataset， the proposed solution improves 0.08 in terms of the EPE index compared with the second-best algorithm. Visual comparisons with all the compared methods are also provided. The results show the proposed model preserves more accurate details than other optical flow estimations， especially under low-light conditions.

Conclusion

In this paper， a dual-branch Siamese network is proposed for realizing the accurate encoding of the optical flow features under normal-light and low-light conditions. The feature extractor is constructed with a weight-sharing block， which enables better-supervised learning for low-light optical flow estimation. The proposed model has remarkable advantages in accuracy and generalizability for the flow estimation. The experimental results indicate the proposed supervised low-light flow estimation outperforms the state-of-the-art solutions in terms of precision.

关键词

光流估计孪生网络相关性卷表全局运动聚合弱光图像增强

Keywords

optical flow estimationSiamese networkcorrelation volumeglobal motion aggregationlow-light image enhancement

references

Black M J and Anandan P. 1996. The robust estimation of multiple motions： parametric and piecewise-smooth flow fields. Computer Vision and Image Understanding， 63（1）： 75-104 ［DOI： 10.1006/cviu.1996.0006http://dx.doi.org/10.1006/cviu.1996.0006］

Brox T， Bruhn A， Papenberg N and Weickert J. 2004. High accuracy optical flow estimation based on a theory for warping//Proceedings of the 8th European Conference on Computer Vision. Prague， Czech Republic： Springer： 25-36 ［DOI： 10.1007/978-3-540-24673-2_3http://dx.doi.org/10.1007/978-3-540-24673-2_3］

Butler D J， Wulff J， Stanley G B and Black M J. 2012. A naturalistic open source movie for optical flow evaluation//Proceedings of the 12th European Conference on Computer Vision. Florence， Italy： Springer： 611-625 ［DOI： 10.1007/978-3-642-33783-3_44http://dx.doi.org/10.1007/978-3-642-33783-3_44］

Chen C， Chen Q F， Xu J and Koltun V. 2018. Learning to see in the dark//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 3291-3300 ［DOI： 10.1109/CVPR.2018.00347http://dx.doi.org/10.1109/CVPR.2018.00347］

Chen Y W， Yang H K， Chiu C C and Lee C Y. 2022. S2F2： single-stage flow forecasting for future multiple trajectories prediction//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 2535-2538 ［DOI： 10.1109/CVPRW56347.2022.00285http://dx.doi.org/10.1109/CVPRW56347.2022.00285］

Danielyan A， Katkovnik V and Egiazarian K. 2012. BM3D frames and variational image deblurring. IEEE Transactions on Image Processing， 21（4）： 1715-1728 ［DOI： 10.1109/TIP.2011.2176954http://dx.doi.org/10.1109/TIP.2011.2176954］

Dosovitskiy A， Fischer P， Ilg E， Hausser P， Hazirbas C， Golkov V， Smagt P V D， Cremers D and Brox T. 2015. FlowNet： learning optical flow with convolutional networks//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago， Chile： IEEE： 2758-2766 ［DOI： 10.1109/ICCV.2015.316http://dx.doi.org/10.1109/ICCV.2015.316］

Geiger A， Lenz P， Stiller C and Urtasun R. 2013. Vision meets robotics： the KITTI dataset. The International Journal of Robotics Research， 32（11）： 1231-1237 ［DOI： 10.1177/0278364913491297http://dx.doi.org/10.1177/0278364913491297］

Gu Z H， Li F， Fang F M and Zhang G X. 2020. A novel retinex-based fractional-order variational model for images with severely low light. IEEE Transactions on Image Processing， 29： 3239-3253 ［DOI： 10.1109/TIP.2019.2958144http://dx.doi.org/10.1109/TIP.2019.2958144］

Horn B K P and Schunck B G. 1981. Determining optical flow. Artificial Intelligence， 17（1/3）： 185-203 ［DOI： 10.1016/0004-3702（81）90024-2http://dx.doi.org/10.1016/0004-3702（81）90024-2］

Ilg E， Mayer N， Saikia T， Keuper M， Dosovitskiy A and Brox T. 2017. FlowNet 2.0： evolution of optical flow estimation with deep networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 1647-1655 ［DOI： 10.1109/CVPR.2017.179http://dx.doi.org/10.1109/CVPR.2017.179］

Jiang S H， Campbell D， Lu Y， Li H D and Hartley R. 2021b. Learning to estimate hidden motions with global motion aggregation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 9752-9761 ［DOI： 10.1109/ICCV48922.2021.00963http://dx.doi.org/10.1109/ICCV48922.2021.00963］

Jiang Y F， Gong X Y， Liu D， Cheng Y， Fang C， Shen X H， Yang J C， Zhou P and Wang Z Y. 2021c. EnlightenGAN： deep light enhancement without paired supervision. IEEE Transactions on Image Processing， 30： 2340-2349 ［DOI： 10.1109/TIP.2021.3051462http://dx.doi.org/10.1109/TIP.2021.3051462］

Jiang S H， Lu Y， Li H D and Hartley R. 2021a. Learning optical flow from a few matches//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 16587-16595 ［DOI： 10.1109/CVPR46437.2021.01632http://dx.doi.org/10.1109/CVPR46437.2021.01632］

Kong L T， Jiang B Y， Luo D H， Chu W Q， Huang X M， Tai Y， Wang C J and Yang J. 2022. IFRNet： intermediate feature refine network for efficient frame interpolation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 1959-1968 ［DOI： 10.1109/CVPR52688.2022.00201http://dx.doi.org/10.1109/CVPR52688.2022.00201］

Li C Y， Guo C L， Han L H， Jiang J， Cheng M M， Gu J W and Loy C C. 2022a. Low-light image and video enhancement using deep learning： a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence， 44（12）： 9396-9416 ［DOI： 10.1109/TPAMI.2021.3126387http://dx.doi.org/10.1109/TPAMI.2021.3126387］

Li Y X， Lu Z C， Xiong X H and Huang J. 2022b. PERF-Net： pose empowered RGB-flow net//Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa， USA： IEEE： 798-807 ［DOI： 10.1109/WACV51458.2022.00087http://dx.doi.org/10.1109/WACV51458.2022.00087］

Lipson L， Teed Z and Deng J. 2021. RAFT-Stereo： multilevel recurrent field transforms for stereo matching//Proceedings of 2021 IEEE International Conference on 3D Vision （3DV）. London， UK： IEEE： 218-227 ［DOI： 10.1109/3DV53792.2021.00032http://dx.doi.org/10.1109/3DV53792.2021.00032］

Lucas B D and Kanade T. 1981. An iterative image registration technique with an application to stereo vision//Proceedings of the 7th International Joint Conference on Artificial Intelligence. Vancouver， Canada： Morgan Kaufmann Publishers Inc.： 674-679

Ma L， Ma T Y and Liu R S. 2022. The review of low-light image enhancement. Journal of Image and Graphics， 27（5）： 1392-1409

马龙，马腾宇，刘日升. 2022. 低光照图像增强算法综述. 中国图象图形学报， 27（5）： 1392-1409 ［DOI： 10.11834/jig.210852http://dx.doi.org/10.11834/jig.210852］

Ranjan A and Black M J. 2017. Optical flow estimation using a spatial pyramid network//Proceedings of 2017 IEEE conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 2720-2729 ［DOI： 10.1109/CVPR.2017.291http://dx.doi.org/10.1109/CVPR.2017.291］

Ren X T， Yang W H， Cheng W H and Liu J Y. 2020. LR3M： robust low-light enhancement via low-rank regularized retinex model. IEEE Transactions on Image Processing， 29： 5862-5876 ［DOI： 10.1109/TIP.2020.2984098http://dx.doi.org/10.1109/TIP.2020.2984098］

Sun D Q， Yang X D， Liu M Y and Kautz J. 2018. PWC-Net： CNNs for optical flow using pyramid， warping， and cost volume//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 8934-8943 ［DOI： 10.1109/CVPR.2018.00931http://dx.doi.org/10.1109/CVPR.2018.00931］

Teed Z and Deng J. 2020. RAFT： recurrent all-pairs field transforms for optical flow//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 402-419 ［DOI： 10.1007/978-3-030-58536-5_24http://dx.doi.org/10.1007/978-3-030-58536-5_24］

Vaswani A， Shazeer N， Parmar N， Uszkoreit J， Jones L， Gomez A N， Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 6000-6010

Xu H F， Zhang J， Cai J F， Rezatofighi H and Tao D C. 2022. GMFlow： learning optical flow via global matching//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 8111-8120 ［DOI： 10.1109/CVPR52688.2022.00795http://dx.doi.org/10.1109/CVPR52688.2022.00795］

Zhang M F， Zheng Y Q and Lu F. 2022. Optical flow in the dark. IEEE Transactions on Pattern Analysis and Machine Intelligence， 44（12）： 9464-9476 ［DOI： 10.1109/tpami.2021.3130302http://dx.doi.org/10.1109/tpami.2021.3130302］

Zhao S Y， Zhao L， Zhang Z X， Zhou E Y and Metaxas D. 2022. Global matching with overlapping attention for optical flow estimation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 17571-17580 ［DOI： 10.1109/CVPR52688.2022.01707http://dx.doi.org/10.1109/CVPR52688.2022.01707］

Zheng Y Q， Zhang M F and Lu F. 2020. Optical flow in the dark//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 6748-6756 ［DOI： 10.1109/CVPR42600.2020.00678http://dx.doi.org/10.1109/CVPR42600.2020.00678］

Alert me when the article has been cited

提交

Infrared target tracking algorithm based on attention mechanism enhancement and target model update

Double template fusion based siamese network for robust visual object tracking

Target tracking system based on the Siamese guided anchor region proposal network

Combining attention mechanism and knowledge distillation for Siamese network compression

Multiscale deep features fusion for change detection