平行视觉的基本框架与关键算法

张慧; 李轩; 王飞跃

doi:10.11834/jig.200400

平行驾驶 | 浏览量 : 0 下载量: 0 CSCD: 0

PDF
导出
分享
收藏
专辑

平行视觉的基本框架与关键算法
The basic framework and key algorithms of parallel vision
2021年26卷第1期页码：82-92
纸质出版日期： 2021-01-16 ，

录用日期： 2020-10-29
DOI： 10.11834/jig.200400
稿件说明：

移动端阅览

张慧, 李轩, 王飞跃. 平行视觉的基本框架与关键算法[J]. 中国图象图形学报, 2021,26(1):82-92.

Hui Zhang, Xuan Li, Feiyue Wang. The basic framework and key algorithms of parallel vision[J]. Journal of Image and Graphics, 2021,26(1):82-92.
张慧, 李轩, 王飞跃. 平行视觉的基本框架与关键算法[J]. 中国图象图形学报, 2021,26(1):82-92. DOI： 10.11834/jig.200400.

Hui Zhang, Xuan Li, Feiyue Wang. The basic framework and key algorithms of parallel vision[J]. Journal of Image and Graphics, 2021,26(1):82-92. DOI： 10.11834/jig.200400.

摘要

目的

随着计算机与人工智能的快速发展，视觉感知技术突飞猛进。然而，以深度学习为主的视觉感知方法依赖于大规模多样性的数据集，因此，本文提出了基于平行学习的视觉分析框架——平行视觉，它通过大量精细标注的人工图像来给视觉算法补充足够的图像数据，从而将计算机变成计算智能的“实验室”。

方法

首先人工图像系统模拟实际图像中可能出现的成像条件，利用系统内部参数自动得到标注信息，获取符合要求的人工图像数据；然后使用预测学习设计视觉感知模型，利用计算实验方法在人工图像系统生成的大量图像数据上进行各种实验，方便地研究复杂环境条件等困难场景对视觉感知模型的影响，使一些实际中的不可控因素转变为可控因素，增加视觉模型的可解释性；最后通过指示学习反馈优化模型参数，利用视觉感知模型在实际场景下存在的困难来指导其在人工场景的训练，以实际与人工虚实互动的方式，在线学习和优化视觉感知模型。由于已经有大量研究人员致力于构建人工场景并生成大量虚拟图像，因此本文采用已构建的这些人工场景图像，并对实际场景图像进行翻转、裁剪、缩放等数据扩充，然后以计算实验和预测学习为重点，开展了相关的应用实例研究。

结果

在SYNTHIA（synthetic collection of imagery and annotations），Virtual KITTI（Karlsruhe Institute of Technology and Toyota Technological Institute）和VIPER（visual perception benchmark）数据集上进行的大量实验表明，本文方法能够有效地克服数据集分布差异对模型泛化能力的影响，性能优于同期最好的方法，比如在SYNTHIA数据集上检测和分割性能分别提升了3.8%和2.7%。

结论

平行视觉是视觉计算领域的一个重要研究方向，通过与深度学习的结合，将推动越来越多的智能视觉系统发展成熟并走向应用。

Abstract

Objective

Computer vision makes the camera and computer the "eyes" of the computer

which can have the abilities of segmentation

classification

recognition

tracking and decision-making. In recent years

computer vision technology has been widely used in intelligent transportation

unmanned driving

robot navigation

intelligent video monitoring

and many other fields. At present

the camera has become the most commonly used sensing equipment in automatic driving and smart cities

generating massive image and video data. We can realize real-time analysis and processing of these data only by relying on computer vision technology. We can detect all kinds of objects in real time and obtain their position and motion states accurately from the image video. However

the actual scene has a very high complexity. Many complex factors interweave together

which poses a great challenge to the visual computing system. At present

computer vision technology is mainly based on the deep learning method through large-scale data-driven mechanisms. Sufficient data are needed due to the heavy dependence of its training algorithm mechanism on datasets. However

collecting and labeling large-scale image data from actual scenes are time-consuming and labor-intensive tasks

and usually

only small-scale and limited diversity of image data can be obtained. For example

Microsoft common objects in context(MS COCO)

a popular dataset used for instance segmentation tasks

has a size of about 300 000 and mainly 91 categories. Expressing the complexity of reality and simulate the real situation is difficult. The model trained on the limited dataset will lack practical significance

because the dataset is not large enough to represent the real data distribution and cannot guarantee the effectiveness of practical application.

Method

The theory of social computing and parallel systems is proposed based on artificial systems

computational experiments

and parallel execution (ACP). The ACP methodology plays an essential role in modeling and control of complex systems. A virtual artificial society is constructed to connect the virtual and the real world through parallel management. On the basis of the existing facts

artificial system is used to model the behavior of complex systems by using advanced computing experiments and then analyze its behavior and interact with reality to obtain a better operating system than reality. To address the bottleneck of deep learning in the field of computer vision

this paper proposes parallel vision

a visual analysis framework based on parallel learning. Parallel vision is an intelligent visual perception framework that is an extension of the ACP methodology into the computer vision field. In the framework of parallel vision

large-scale realistic artificial images can be obtained easily to give support to the vision algorithm with enough well-labeled image data. In this way

the computer can be turned into a "laboratory" of computational intelligence. First

the artificial image system simulates the imaging conditions that may appear in the actual image

uses the internal parameters of the system to automatically obtain the annotation information

and obtains the required artificial images. Then

we use the predictive learning method to design the visual perception model

and then we use the computational experiment method to conduct experiments. Various experiments are conducted on a rich supply of image data generated in the artificial image system. Studying the influence of difficult scenes such as complex environmental conditions on the visual perception model is convenient; thus

some uncontrollable factors in practice can be transformed into controllable factors

and the interpretability of the visual model is increased. Finally

we use prescriptive learning method to optimize model parameters. The difficulty of the visual perception model in the actual scene can be used to guide the model training in the artificial scene. We learn and optimize the visual perception model online through virtual-real interaction. This paper also conducted an application case study to preliminarily demonstrate the effectiveness of the proposed framework. This case can work over synthetic images with accurate annotations and real images without any labels. The virtual-real interaction guides the model to learn useful information from synthetic data while keeping consistent with real data. We first analyze the data distribution discrepancy from a probabilistic perspective and divide it into image-level and instance-level discrepancies. Then

we design two components to align these discrepancies

i.e.

global-level alignment and local-level alignment. Furthermore

a consistency alignment component is proposed to encourage the consistency between the global-level and the local-level alignment components.

Result

We evaluate the proposed approach on the real Cityscapes dataset by adapting from virtual SYNTHIA(synthetic collection of imagery and annotations)

Virtual KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute)

and VIPER(visual perception benchmark) datasets. Experimental results demonstrate that it achieves significantly better performance than state-of-the-art methods.

Conclusion

Parallel vision is an important research direction in the field of visual computing. Through combination with deep learning

more and more intelligent vision systems will be developed and applied.

关键词

计算机视觉平行学习平行视觉视觉感知模型实例分割目标检测

Keywords

computer visionparallel learningparallel visionvisual perception modelinstance segmentationobject detection

references

Chen Y H, Li W and Van Gool L. 2018. ROAD: reality oriented adaptation for semantic segmentation of urban scenes//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7892-7901[DOI:10.1109/CVPR.2018.00823http://dx.doi.org/10.1109/CVPR.2018.00823]

Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S and Schiele B. 2016. The cityscapes dataset for semantic urban scene understanding//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 3213-3223[DOI:10.1109/CVPR.2016.350http://dx.doi.org/10.1109/CVPR.2016.350]

Gaidon A, Wang Q, Cabon Y and Vig E. 2016. Virtual worlds as proxy for multi-object tracking analysis//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4340-4349[DOI:10.1109/CVPR.2016.470http://dx.doi.org/10.1109/CVPR.2016.470]

Goyette N, Jodoin P M, Porikli F, Konrad J and Ishwar P. 2014. A novel video dataset for change detection benchmarking. IEEE Transactions on Image Processing, 23(11):4663-4679[DOI:10.1109/TIP.2014.2346013]

He K M, Gkioxari G, Dollár P and Girshick R. 2017. Mask R-CNN11//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2980-2988[DOI:10.1109/ICCV.2017.322http://dx.doi.org/10.1109/ICCV.2017.322]

Hoffman J, Wang D, Yu F and Darrell T. 2016. FCNs in the wild: pixel-level adversarial and constraint-based adaptation[EB/OL].[2020-06-20].https://arxiv.org/pdf/1612.02649.pdfhttps://arxiv.org/pdf/1612.02649.pdf

Li L, Lin Y L, Cao D P, Zheng N N and Wang F Y. 2017. Parallel learning——a new framework for machine learning. Acta Automatica Sinica, 43(1):1-8

李力, 林懿伦, 曹东璞, 郑南宁, 王飞跃. 2017.平行学习——机器学习的一个新型理论框架.自动化学报, 43(1):1-8)[DOI:10.16383/j.aas.2017.y000001]

Li X, Wang K F, Tian Y L, Yan L, Deng F and Wang F Y. 2018. The paralleleye dataset:a large collection of virtual images for traffic vision research. IEEE Transactions on Intelligent Transportation Systems, 20(6):2072-2084[DOI:10.1109/TITS.2018.2857566]

Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2980-2988[DOI:10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324]

Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P and Zitnick C L. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 740-755[DOI:10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48]

Liu T, Wang X, Xing Y, Gao Y, Tian B and Chen L. 2019. Research on digital quadruplets in cyber-physical-social space-based parallel driving. Chinese Journal of Intelligent Science and Technology, 1(1):40-51

刘腾, 王晓, 邢阳, 高玉, 田滨, 陈龙. 2019.基于数字四胞胎的平行驾驶系统及应用.智能科学与技术学报, 1(1):40-51)[DOI:10.11959/j.issn.2096-6652.201902]

LyuY S, Chen Y Y, Jin J C, Li Z J, Ye P J and Zhu F H. 2019. Parallel transportation:virtual-real interaction for intelligent traffic management and control. Chinese Journal of Intelligent Science and Technology, 1(1):21-33

吕宜生, 陈圆圆, 金峻臣, 李镇江, 叶佩军, 朱凤华. 2019.平行交通:虚实互动的智能交通管理与控制.智能科学与技术学报, 1(1):21-33)[DOI:10.11959/j.issn.2096-6652.201908]

Luo G Y, Yuan Q, Zhou H B, Cheng N, Liu Z H, Yang F C, and Shen X S. 2018. Cooperative vehicular content distribution in edge computing assisted 5G-VANET. China Communications, 15(7):1-17[DOI:10.1109/CC.2018.8424578]

Luo G Y, Zhang H, He H B, Li J L and Wang F Y. 2020. Multiagent adversarial collaborative learning via mean-field theory[EB/OL].[2020-07-05].https://ieeexplore.ieee.org/document/9238422https://ieeexplore.ieee.org/document/9238422

Luo G Y, Zhou H B, Cheng N, Yuan Q, Li J L, Yang F C and Shen X S. 2019. Software defined cooperative data sharing in edge computing assisted 5G-VANET[EB/OL].[2020-07-05].https://ieeexplore.ieee.org/document/8897045https://ieeexplore.ieee.org/document/8897045

Ren S Q, He K M, Girshick R and Sun J. 2015. Faster R-CNN:towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6):1137-1149[DOI:10.1109/TPAMI.2016.2577031]

Richter S R, Hayder Z and Koltun V. 2017. Playing for benchmarks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2213-2222[DOI:10.1109/ICCV.2017.243http://dx.doi.org/10.1109/ICCV.2017.243]

Ros G, Sellart L, Materzynska J, Vazquez D and Lopez A M. 2016. The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 3234-3243[DOI:10.1109/CVPR.2016.352http://dx.doi.org/10.1109/CVPR.2016.352]

Saleh F S, Aliakbarian M S, Salzmann M, Petersson L and Alvarez J M. 2017. Bringing background into the foreground: making all classes equal in weakly-supervised video semantic segmentation//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy:IEEE: 2125-2135[DOI:10.1109/ICCV.2017.232http://dx.doi.org/10.1109/ICCV.2017.232]

Saleh F S, Aliakbarian M S, Salzmann M, Petersson L and Alvarez J M. 2018. Effective use of synthetic data for urban scene semantic segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 86-103[DOI:10.1007/978-3-030-01216-8_6http://dx.doi.org/10.1007/978-3-030-01216-8_6]

Torralba A and Efros A A. 2011. Unbiased look at dataset bias//Proceedings of 2011 IEEE International Conference on Computer Vision. Providence, USA: IEEE: 1521-1528[DOI:10.1109/CVPR.2011.5995347http://dx.doi.org/10.1109/CVPR.2011.5995347]

Wang F Y. 2004. Artificial societies, computational experiments, and parallel systems:a discussion on computational theory of complex social-economic systems. Complex Systems and Complexity Science, 1(4):25-35

王飞跃. 2004.人工社会、计算实验、平行系统——关于复杂社会经济系统计算研究的讨论.复杂系统与复杂性科学, 1(4):25-35)[DOI:10.3969/j.issn.1672-3813.2004.04.002]

Wang F Y, Li C G, Guo Y Y, Wang J, Wang X, Qiu T Y, Meng X B and Shi X B. 2017. Parallel gout:an ACP-based system framework for gout diagnosis and treatment. Pattern Recognition and Artificial Intelligence, 30(12):1057-1068

王飞跃, 李长贵, 国元元, 王静, 王晓, 邱天雨, 孟祥冰, 施小博. 2017.平行高特:基于ACP的平行痛风诊疗系统框架.模式识别与人工智能, 30(12):1057-1068)[DOI:10.16451/j.cnki.issn1003-6059.201712001]

Wang F Y, Tang Y, Liu X W and Yuan Y. 2019. Social education:opportunities and challenges in cyber-physical-social space. IEEE Transactions on Computational Social Systems, 6(2):191-196[DOI:10.1109/TCSS.2019.2905941]

Wang K F, Gou C and Wang F Y. 2016. Parallel vision:an ACP-based approach to intelligent vision computing. Acta Automatica Sinica, 42(10):1490-1500

王坤峰, 苟超, 王飞跃. 2016.平行视觉:基于ACP的智能视觉计算方法.自动化学报, 42(10):1490-1500)[DOI:10.16383/j.aas.2016.c160604]

Wang K F, Gou C, Zheng N N, Rehg J M and Wang F Y. 2017. Parallel vision for perception and understanding of complex scenes:methods, framework, and perspectives. Artificial Intelligence Review, 48(3):299-329[DOI:10.1007/s10462-017-9569-z]

Wiley V and Lucas T. 2018. Computer vision and image processing:a paper review. International Journal of Artificial Intelligence Research, 2(1):28-36[DOI:10.29099/ijair.v2i1.42]

Yu F and Koltun V. 2016. Multi-scale context aggregation by dilated convolutions[EB/OL].[2020-06-20].https://arxiv.org/pdf/1511.07122.pdfhttps://arxiv.org/pdf/1511.07122.pdf

Yuille A L and Liu C X. 2020. Deep nets: what have they ever done for Vision?[EB/OL].[2020-06-20].https://arxiv.org/pdf/1805.04025.pdfhttps://arxiv.org/pdf/1805.04025.pdf

Zhang H, Kang D Q, He H B and Wang F Y. 2020. APLNet:attention-enhanced progressive learning network. Neurocomputing, 371:166-176[DOI:10.1016/j.neucom.2019.08.086]

Zhang H, Tian Y L, Wang K F, Zhang W S and Wang F Y. 2019. Mask SSD:an effective single-stage approach to object instance segmentation. IEEE Transactions on Image Processing, 29:2078-2093[DOI:10.1109/TIP.2019.2947806]

Zhang H, Wang K F, Tian Y L, Gou C and Wang F Y. 2018. MFR-CNN:incorporating multi-scale features and global information for traffic object detection. IEEE Transactions on Vehicular Technology, 67(9):8019-8030[DOI:10.1109/TVT.2018.2843394]

Zhang H, Wang K F and Wang F Y. 2017. Advances and perspectives on applications of deep learning in visual object detection. Acta Automatica Sinica, 43(8):1289-1305

张慧, 王坤峰, 王飞跃. 2017.深度学习在目标视觉检测中的应用进展与展望.自动化学报, 43(8):1289-1305)[DOI:10.16383/j.aas.2017.c160822]

Zhang Y, David P and Gong B. 2017. Curriculum domain adaptation for semantic segmentation of urban scenes//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2020-2030[DOI:10.1109/ICCV.2017.223http://dx.doi.org/10.1109/ICCV.2017.223]

文章被引用时，请邮件提醒。

提交

红外与可见光图像特征动态选择的目标检测网络