非视口依赖的抗畸变无参考全景图像质量评价

鄢杰斌; 谭湽文; 吴康诚; 刘学林; 方玉明

doi:10.11834/jig.240188

图像理解和计算机视觉 | 浏览量 : 0 下载量: 282 CSCD: 0

PDF
导出
分享
收藏
专辑

非视口依赖的抗畸变无参考全景图像质量评价
Viewport-independent and deformation-unaware no-reference omnidirectional image quality assessment
2024年29卷第12期页码：3699-3711
收稿日期：2024-04-01，

修回日期：2024-05-22，

纸质出版日期：2024-12-16
DOI： 10.11834/jig.240188
稿件说明：

移动端阅览

鄢杰斌，谭湽文，吴康诚，刘学林，方玉明. 2024. 非视口依赖的抗畸变无参考全景图像质量评价. 中国图象图形学报， 29(12):3699-3711 DOI： 10.11834/jig.240188.

Yan Jiebin， Tan Ziwen， Wu Kangcheng， Liu Xuelin， Fang Yuming. 2024. Viewport-independent and deformation-unaware no-reference omnidirectional image quality assessment. Journal of Image and Graphics， 29(12):3699-3711 DOI： 10.11834/jig.240188.

摘要

目的

全景图像质量评价（omnidirectional image quality assessment，OIQA）旨在定量描述全景图像降质情况，对于算法提升和系统优化起着重要的作用。早期的OIQA方法设计思想主要是结合全景图像的几何特性（如两级畸变和语义分布不均匀）和2D-IQA方法，这类方法并未考虑用户的观看行为，因而性能一般；现有的OIQA方法主要通过模拟用户的观看行为，提取观看视口序列；进一步，计算视口序列失真情况，然后融合视口失真得到全景图像的全局质量。然而，观看视口序列预测较为困难，且预测模型的实时性和鲁棒性难以保证。为了解决上述问题，提出一种非视口依赖的抗畸变无参考（no reference，NR）OIQA（NR-OIQA）模型。针对全景图像等距柱状投影（equirectangular projection，ERP）所带来的规律性几何畸变问题，提出一种可同时处理不规则语义和规律性畸变的新型卷积方法，称为等矩形可变形卷积方法，并基于该卷积方法构建NR-OIQA模型。

方法

该模型主要由先验指导的图像块采样（prior-guided patch sampling，PPS）模块、抗畸变特征提取（deformation-unaware feature extraction，DUFE）模块和块内—块间注意力聚集（intra-inter patch attention aggregation，A-EPAA）模块3个部件组成。其中，PPS模块根据先验概率分布从高分辨率的全景图像采样提取相同分辨率的图像块；DUFE模块通过等矩形可变形卷积渐进式地提取输入图像块质量相关特征；A-EPAA模块旨在调整单个图像块内部特征以及各图像块对整体质量评价的影响程度，以提升模型对全景图像质量的评价准确度。

结果

在3个公开数据集上将本文模型与其他IQA和OIQA模型进行性能比较，与性能第1的Assessor360相比，参数量减少了93.7%，计算量减少了95.4%；与模型规模近似的MC360IQA相比，在CVIQ、OIQA和JUFE数据集上的斯皮尔曼相关系数分别提升了1.9%、1.7%和4.3%。

结论

本文所提出的NR-OIQA模型，充分考虑了全景图像的特点，能够以不依赖视口的方式高效提取具有失真特性的质量特征，对全景图像进行准确质量评价，并具有计算量低的优点。

Abstract

Objective

With the rapid development of the virtual reality （VR） industry， the omnidirectional image acts as an important medium of visual representation of VR and may degrade in the procedure of acquisition， transmission， processing， and storage. Omnidirectional image quality assessment （OIQA） is an evaluation technique that aims to quantitatively describe the degradation of omnidirectional images and plays a crucial role in algorithm improvement and system optimization. Generally， the omnidirectional image has some inherent characteristics， i.e.， geometric deformation in the polar region and semantic information more concentrated on the equatorial region. The viewing behavior can conspicuously affect the perceptual quality of an omnidirectional image. Early OIQA methods that simply fuse this inherent characteristic in 2D-IQA do not consider the significant user viewing behavior， thus obtaining suboptimal performance. Considering the viewport representation that is in line with the user viewing behavior， some deep learning-based OIQA methods have recently achieved promising performance by taking the predicted viewport sequence as the model input and computing the degradation. However， the prediction of the viewport sequence is difficult and viewport extraction needs a series of pixel-wise computations， thus leading to a significant computation load and hampering the application in the industry environment. To address the above problems， we proposed a new no-reference OIQA model， which introduces an equirectangular modulated deformable convolution （EquiMdconv） that can deal with the irregular semantics and the regular deformation caused by equirectangular projection simultaneously without the predicted viewport sequence.

Method

We propose a viewport-independent and deformation-unaware no-reference OIQA model for omnidirectional image quality assessment. Our model is composed of three parts： a prior-guided patch sampling （PPS） module， a deformable-unaware feature extraction （DUFE） module， and an intra-interpatch attention aggregation （A-EPAA） module. The PPS module samples a set of patch images on the basis of prior probability distribution in a slice-based manner to represent the complete image quality information. DUFE aims to extract the perceptual quality features of the input patch images， considering the irregular semantics and regular deformation in this process. It contains eight blocks， and each block comprises an EquiMconv layer， a 1 × 1 convolutional layer， a batch normalization layer， and a 3 × 3 max pooling layer. The EquiMconv layer employs a modulated deformable convolution layer that introduces learnable offset parameters to model distortions in the images more accurately. Furthermore， we incorporate fixed offsets based on distortion regularity factors into the deformable convolution’s offset to effectively eliminate the regular deformation. The A-EPAA comprises a convolutional block attention module （CBAM） and a patch attention module （PA）. The CBAM assigns weights to each channel to adjust perceptual quality features in both channel and spatial dimensions. The PA adjusts the contribution weights between patch images for an overall quality assessment. We train the proposed model on the CVIQ， OIQA， and JUFE databases. In the training stage， we split each database into two parts： 80% for training and 20% for testing. We sample 10 patch images from each omnidirectional image， and the size of the patch image is set to 224 × 224. All experiments are implemented on a server with an NVIDIA GTX A5000 GPU. Adaptive moment estimation optimizer （Adam） is utilized to optimize our model. We train the model for 300 epochs on the CVIQ and OIQA databases and 20 epochs on the JUFE database； the learning rate is 0.000 1 and the batch size is 16.

Result

We conduct experiments covering three databases， namely， CVIQ， OIQA， and JUFE. We demonstrate the performance of the proposed model by comparing it with nine viewport-independent models and five viewport-dependent models. To ensure a persuasive comparison result， we select the Pearson linear correlation coefficient and Spearman’s rank correlation coefficient （SRCC） as performance evaluation standards. The results indicate that compared with those of the state-of-the-art viewport-dependent model， i.e.， Assessor360， the parameters of our model are reduced by 93.7% and the floating point operations are reduced by 95.4%. Compared with the MC360IQA， which has a similar model size， the SRCC is increased by 1.9%， 1.7%， and 4.3% on the CVIQ， OIQA， and JUFE databases， respectively.

Conclusion

Our proposed viewport-independent and deformation-unaware no-reference OIQA model thoroughly considers the characteristics of the omnidirectional image. It can effectively extract quality features and accurately assess the quality of omnidirectional images with limited computational cost.

关键词

Keywords

references

Chen S J ， Zhang Y X ， Li Y M ， Chen Z Z and Wang Z . 2018 . Spherical structural similarity index for objective omnidirectional video quality assessment // Proceedings of 2018 IEEE International Conference on Multimedia and Expo . San Diego， USA ： IEEE： 1 - 6 ［ DOI： 10.1109/ICME.2018.8486584 http://dx.doi.org/10.1109/ICME.2018.8486584 ］

Dai J F ， Qi H Z ， Xiong Y W ， Li Y ， Zhang G D ， Hu H and Wei Y C . 2017 . Deformable convolutional networks // Proceedings of 2017 IEEE International Conference on Computer Vision . Venice， Italy ： IEEE： 764 - 773 ［ DOI： 10.1109/ICCV.2017.89 http://dx.doi.org/10.1109/ICCV.2017.89 ］

Deng X ， Wang H ， Xu M ， Guo Y C ， Song Y H and Yang L . 2021 . LAU-Net： latitude adaptive upscaling network for omnidirectional image super-resolution // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： 9185 - 9194 ［ DOI： 10.1109/CVPR46437.2021.00907 http://dx.doi.org/10.1109/CVPR46437.2021.00907 ］

Djilali Y A D ， McGuinness K and O’Connor N E . 2021 . Simple baselines can fool 360° saliency metrics // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision Workshops . Montreal， Canada ： IEEE： 3743 - 3749 ［ DOI： 10.1109/ICCVW54120.2021.00418 http://dx.doi.org/10.1109/ICCVW54120.2021.00418 ］

Duan H Y ， Zhai G T ， Min X K ， Zhu Y C ， Fang Y and Yang X K . 2018 . Perceptual quality assessment of omnidirectional images // Proceedings of 2018 IEEE International Symposium on Circuits and Systems . Florence， Italy ： IEEE： 1 - 5 ［ DOI： 10.1109/ISCAS.2018.8351786 http://dx.doi.org/10.1109/ISCAS.2018.8351786 ］

Fang Y M ， Huang L P ， Yan J B ， Liu X L and Liu Y . 2022 . Perceptual quality assessment of omnidirectional images // Proceedings of the AAAI Conference on Artificial Intelligence . Vancouver， Canada ： AAAI Press： 580 - 588 ［ DOI： 10.1609/aaai.v36i1.19937 http://dx.doi.org/10.1609/aaai.v36i1.19937 ］

Fang Y M ， Zhong Y ， Yan J B and Liu L X . 2022 . Deep attention guided image cropping with fine-grained feature aggregation . Journal of Image and Graphics ， 27 （ 2 ）： 586 - 601

方玉明，钟裕，鄢杰斌，刘丽霞 . 2022 . 聚合细粒度特征的深度注意力自动裁图 . 中国图象图形学报， 27 （ 2 ）： 586 - 601 ［ DOI： 10.11834/jig.210544 http://dx.doi.org/10.11834/jig.210544 ］

Fernandez-Labrador C ， Facil J ， Perez-Yus A ， Demonceaux C ， Civera J and Guerrero J J . 2020 . Corners for layout： end-to-end layout recovery from 360 images . IEEE Robotics and Automation Letters ， 5 （ 2 ）： 1255 - 1262 ［ DOI： 10.1109/LRA.2020.2967274 http://dx.doi.org/10.1109/LRA.2020.2967274 ］

Fu J ， Hou C ， Zhou W ， Xu J H and Chen Z B . 2022 . Adaptive hypergraph convolutional network for no-reference 360-degree image quality assessment // Proceedings of the 30th ACM International Conference on Multimedia . Lisbon， Portugal ： ACM： 961 - 969 ［ DOI： 10.1145/3503161.3548337 http://dx.doi.org/10.1145/3503161.3548337 ］

Hands D S and Avons S E . 2001 . Recency and duration neglect in subjective assessment of television picture quality . Applied Cognitive Psychology ， 15 （ 6 ）： 639 - 657 ［ DOI： 10.1002/acp.731 http://dx.doi.org/10.1002/acp.731 ］

He W and Pan C . 2022 . The salient object detection based on attention-guided network . Journal of Image and Graphics ， 27 （ 4 ）： 1176 - 1190

何伟，潘晨 . 2022 . 注意力引导网络的显著性目标检测 . 中国图象图形学报， 27 （ 4 ）： 1176 - 1190 ［ DOI： 10.11834/jig.200658 http://dx.doi.org/10.11834/jig.200658 ］

Kim H G ， Lim H T and Ro Y M . 2020 . Deep virtual reality image quality assessment with human perception guider for omnidirectional image . IEEE Transactions on Circuits and Systems for Video Technology ， 30 （ 4 ）： 917 - 928 ［ DOI： 10.1109/TCSVT.2019.2898732 http://dx.doi.org/10.1109/TCSVT.2019.2898732 ］

Leng J X ， Mo M J C ， Zhou Y H ， Ye Y M ， Gao C Q and Gao X B . 2023 . Recent advances in drone-view object detection . Journal of Image and Graphics ， 28 （ 9 ）： 2563 - 2586

冷佳旭，莫梦竟成，周应华，叶永明，高陈强，高新波 . 2023 . 无人机视角下的目标检测研究进展 . 中国图象图形学报， 28 （ 9 ）： 2563 - 2586 ［ DOI： 10.11834/jig.220836 http://dx.doi.org/10.11834/jig.220836 ］

Li C ， Xu M ， Jiang L ， Zhang S Y and Tao X M . 2019 . Viewport proposal CNN for 360° video quality assessment // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， CA， USA ： IEEE： 10169 - 10178 ［ DOI： 10.1109/CVPR.2019.01042 http://dx.doi.org/10.1109/CVPR.2019.01042 ］

Li D Q ， Jiang T T and Jiang M . 2020 . Norm-in-norm loss with faster convergence and better performance for image quality assessment // Proceedings of the 28th ACM International Conference on Multimedia . Seattle， USA ： ACM： 789 - 797 ［ DOI： 10.1145/3394171.3413804 http://dx.doi.org/10.1145/3394171.3413804 ］

Li L D ， Yin Y T ， Wu J J ， Dong W S and Shi G M . 2022 . Mask-fused human face image quality assessment method . Journal of Image and Graphics ， 27 （ 12 ）： 3476 - 3490

李雷达，殷杨涛，吴金建，董伟生，石光明 . 2022 . 掩膜融合下的人脸图像质量评价方法 . 中国图象图形学报， 27 （ 12 ）： 3476 - 3490 ［ DOI： 10.11834/jig.210986 http://dx.doi.org/10.11834/jig.210986 ］

Lim H T ， Kim H G and Ra Y M . 2018 . VR IQA NET： deep virtual reality image quality assessment using adversarial learning // Proceedings of 2018 IEEE International Conference on Acoustics， Speech and Signal Processing . Calgary， Canada ： IEEE： 6737 - 6741 ［ DOI： 10.1109/ICASSP.2018.8461317 http://dx.doi.org/10.1109/ICASSP.2018.8461317 ］

Liu H F ， Chen J J ， Li L ， Bao B K ， Li Z C ， Liu J Y and Nie L Q . 2023 . Cross-modal representation learning and generation . Journal of Image and Graphics ， 28 （ 6 ）： 1608 - 1629

刘华峰，陈静静，李亮，鲍秉坤，李泽超，刘家瑛，聂礼强 . 2023 . 跨模态表征与生成技术 . 中国图象图形学报， 28 （ 6 ）： 1608 - 1629 ［ DOI： 10.11834/jig.230035 http://dx.doi.org/10.11834/jig.230035 ］

Madhusudana P C ， Birkbeck N ， Wang Y L ， Adsumilli B and Bovik A C . 2022 . Image quality assessment using contrastive learning . IEEE Transactions on Image Processing ， 31 ： 4149 - 4161 ［ DOI： 10.1109/TIP.2022.3181496 http://dx.doi.org/10.1109/TIP.2022.3181496 ］

Pan Z Q ， Yuan F ， Lei J J ， Fang Y M ， Shao X and Kwong S . 2022 . VCRNet： visual compensation restoration network for no-reference image quality assessment . IEEE Transactions on Image Processing ， 31 ： 1613 - 1627 ［ DOI： 10.1109/TIP.2022.3144892 http://dx.doi.org/10.1109/TIP.2022.3144892 ］

Sun W ， Gu K ， Ma S W ， Zhu W H ， Liu N and Zhai G T . 2018 . A large-scale compressed 360-degree spherical image database： from subjective quality evaluation to objective model comparison // Proceedings of the 20th IEEE International Workshop on Multimedia Signal Processing . Vancouver， Canada ： IEEE： 1 - 6 ［ DOI： 10.1109/MMSP.2018.8547102 http://dx.doi.org/10.1109/MMSP.2018.8547102 ］

Sun W ， Min X K ， Zhai G T ， Gu K ， Duan H Y and Ma S W . 2020 . MC360IQA： a multi-channel CNN for blind 360-degree image quality assessment . IEEE Journal of Selected Topics in Signal Processing ， 14 （ 1 ）： 64 - 77 ［ DOI： 10.1109/JSTSP.2019.2955024 http://dx.doi.org/10.1109/JSTSP.2019.2955024 ］

Sun Y L ， Lu A and Yu L . 2017 . Weighted-to-spherically-uniform quality evaluation for omnidirectional video . IEEE Signal Processing Letters ， 24 （ 9 ）： 1408 - 1412 ［ DOI： 10.1109/LSP.2017.2720693 http://dx.doi.org/10.1109/LSP.2017.2720693 ］

Wang Z ， Bovik A C ， Sheikh H R and Simoncelli E P . 2004 . Image quality assessment： from error visibility to structural similarity . IEEE Transactions on Image Processing ， 13 （ 4 ）： 600 - 612 ［ DOI： 10.1109/TIP.2003.819861 http://dx.doi.org/10.1109/TIP.2003.819861 ］

Wang Z and Rehman A . 2019 . Begin with the end in mind： a unified end-to-end quality-of-experience monitoring， optimization and management framework . SMPTE Motion Imaging Journal ， 128 （ 2 ）： 1 - 8 ［ DOI： 10.5594/JMI.2018.2887288 http://dx.doi.org/10.5594/JMI.2018.2887288 ］

Woo S ， Park J ， Lee J and Kweon I S . 2018 . CBAM： Convolutional block attention module // Proceedings of the 15th European Conference on Computer Vision . Munich， Germany ： Springer： 3 - 19 ［ DOI： 10.1007/978-3-030-01234-2_1 http://dx.doi.org/10.1007/978-3-030-01234-2_1 ］

Wu T H ， Shi S W ， Cai H M ， Cao M D ， Xiao J ， Zheng Y Q and Yang Y J . 2024 . Assessor 360 ： multi-sequence network for blind omnidirectional image quality assessment //Proceedings of the 37th International Conference on Neural Information Processing Systems. New Orleans， USA ： Curran Associates Inc.： 64957 - 64970 ［ DOI： 10.5555/3666122.3668956 http://dx.doi.org/10.5555/3666122.3668956 ］

Xu J H ， Zhou W and Chen Z B . 2021 . Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks . IEEE Transactions on Circuits and Systems for Video Technology ， 31 （ 5 ）： 1724 - 1737 ［ DOI： 10.1109/TCSVT.2020.3015186 http://dx.doi.org/10.1109/TCSVT.2020.3015186 ］

Yan J B ， Fang Y M and Liu X L . 2022 . The review of distortion-related image quality assessment . Journal of Image and Graphics ， 27 （ 5 ）： 1430 - 1466

鄢杰斌，方玉明，刘学林 . 2022 . 图像质量评价研究综述——从失真的角度 . 中国图象图形学报， 27 （ 5 ）： 1430 - 1466 ［ DOI： 10.11834/jig.210790 http://dx.doi.org/10.11834/jig.210790 ］

Yang S D ， Wu T H ， Shi S W ， Lao S S ， Gong Y ， Cao M D ， Wang J H and Yang Y J . 2022 . MANIQA： multi-dimension attention network for no-reference image quality assessment // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . New Orleans， USA ： IEEE： 1190 - 1199 ［ DOI： 10.1109/CVPRW56347.2022.00126 http://dx.doi.org/10.1109/CVPRW56347.2022.00126 ］

Yu M ， Lakshman H and Girod B . 2015 . A framework to evaluate omnidirectional video coding schemes // Proceedings of 2015 IEEE International Symposium on Mixed and Augmented Reality . Fukuoka， Japan ： IEEE： 31 - 36 ［ DOI： 10.1109/ISMAR.2015.12 http://dx.doi.org/10.1109/ISMAR.2015.12 ］

Zakharchenko V ， Choi K P and Park J H . 2016 . Quality metric for spherical panoramic video // Proceedings of Optics and Photonics for Information Processing X . San Diego， US ： SPIE： #99700 ［ DOI： 10.1117/12.2235885 http://dx.doi.org/10.1117/12.2235885 ］

Zhang C F and Liu S G . 2022 . No-reference omnidirectional image quality assessment based on joint network // Proceedings of the 30th ACM International Conference on Multimedia . Lisbon， Portugal ： ACM： 943 - 951 ［ DOI： 10.1145/3503161.3548175 http://dx.doi.org/10.1145/3503161.3548175 ］

Zhang W X ， Zhai G T ， Wei Y ， Yang X K and Ma K D . 2023 . Blind image quality assessment via vision-language correspondence： a multitask learning perspective // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， Canada ： IEEE： 14071 - 14081 ［ DOI： 10.1109/CVPR52729.2023.01352 http://dx.doi.org/10.1109/CVPR52729.2023.01352 ］

Zhou Y ， Wang Y ， Li L D ， Gao C Q and Lu Z L . 2022 . Research progress in objective quality evaluation of virtual reality images . Journal of Image and Graphics ， 27 （ 8 ）： 2313 - 2328

周玉，汪一，李雷达，高陈强，卢兆林 . 2022 . 虚拟现实图像客观质量评价研究进展 . 中国图象图形学报， 27 （ 8 ）： 2313 - 2328 ［ DOI： 10.11834/jig.210949 http://dx.doi.org/10.11834/jig.210949 ］

Zhou Y F ， Yu M ， Ma H L ， Shao H and Jiang G Y . 2018 . Weighted-to-spherically-uniform SSIM objective quality evaluation for panoramic video // Proceedings of the 14th IEEE International Conference on Signal Processing . Beijing， China ： IEEE： 54 - 57 ［ DOI： 10.1109/ICSP.2018.8652269 http://dx.doi.org/10.1109/ICSP.2018.8652269 ］

Zhu X Z ， Hu H ， Lin S and Dai J F . 2019 . Deformable ConvNets V2 ： more deformable， better results// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 9300 - 9308 ［ DOI： 10.1109/CVPR.2019.00953 http://dx.doi.org/10.1109/CVPR.2019.00953 ］

Zuo Y F ， Fang Y M and Ma K D . 2023 . The critical review of the growth of deep learning-based image fusion techniques . Journal of Image and Graphics ， 28 （ 1 ）： 102 - 117

左一帆，方玉明，马柯德 . 2023 . 深度学习时代图像融合技术进展 . 中国图象图形学报， 28 （ 1 ）： 102 - 117 ［ DOI： 10.11834/jig.220556 http://dx.doi.org/10.11834/jig.220556 ］

文章被引用时，请邮件提醒。

提交

融合特征增强与互补的手物姿态估计方法

双分支注意和特征交互的小样本细粒度学习

多尺度大核注意力特征融合网络的图像超分辨率重建

用于遥感场景分类的全局—局部特征耦合网络

可变形卷积与注意力的SAR舰船检测轻量化模型