无人驾驶突发紧要场景下基于平行视觉的风险增强感知方法
Enhanced risk perception method based on parallel vision for autonomous vehicles in safety-critical scenarios
- 2024年29卷第11期 页码:3265-3279
纸质出版日期: 2024-11-16
DOI: 10.11834/jig.230748
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-11-16 ,
移动端阅览
苟超, 刘欣欣, 郭子鹏, 周昱臣, 王飞跃. 2024. 无人驾驶突发紧要场景下基于平行视觉的风险增强感知方法. 中国图象图形学报, 29(11):3265-3279
Gou Chao, Liu Xinxin, Guo Zipeng, Zhou Yuchen, Wang Feiyue. 2024. Enhanced risk perception method based on parallel vision for autonomous vehicles in safety-critical scenarios. Journal of Image and Graphics, 29(11):3265-3279
目的
2
随着视觉感知技术的快速发展,无人驾驶已经可以应用于简单场景。但是在实际的复杂城市道路应用中,仍然存在一些挑战,尤其是在其他车辆的突然变道、行人的闯入、障碍物的出现等突发紧要场景中。然而,真实世界中此类紧要场景数据存在长尾分布问题,导致数据驱动为主的无人驾驶风险感知面临技术瓶颈,因此, 本文提出一种基于平行视觉的风险增强感知方法。
方法
2
该方法基于交互式ACP(artificial societies,computational experiments,parallel execution)理论,在平行视觉框架下整合描述、指示、预测智能,实现基于视觉的风险增强感知。具体地,基于描述与指示学习,在人工图像系统中引入改进扩散模型,设计背景自适应模块以及特征融合编码器,通过控制生成行人等危险要素的具体位置,实现突发紧要场景风险序列的可控生成;其次,采用基于空间规则的方法,提取交通实体之间的空间关系和交互关系,实现认知场景图的构建;最后,在预测学习框架下,提出了一种新的基于图模型的风险增强感知方法,融合关系图注意力网络和Transformer编码器模块对场景图序列数据进行时空建模,最终实现风险的感知与预测。
结果
2
为验证提出方法的有效性,在MRSG-144(mixed reality scene graph)、IESG(interaction-enhanced scene graph)和1043-carla-sg(1043-carla-scenegraph)数据集上与5种主流风险感知方法进行了对比实验。提出的方法在3个数据集上分别取得了0.956、0.944、0.916的F1-score,均超越了现有主流方法,达到最优结果。
结论
2
本文是平行视觉在无人驾驶风险感知领域的实际应用,对于提高无人驾驶的复杂交通场景风险感知能力,保障无人驾驶系统的安全性具有重要意义。
Objective
2
With the rapid development of visual perception technology, autonomous driving can already be applied to simple scenarios. However, in actual complex urban road applications, especially in safety-critical scenarios such as sudden lane changes by other vehicles, the intrusion of pedestrians, and the appearance of obstacles, some challenges must still be resolved. First, most existing autonomous driving systems still use the vast majority of daily natural scenes or heuristically generated adversarial scenes for training and evaluation. Among them, safety-critical scenarios, which refer to a collection of scenes in areas where cars are in danger of collision, especially scenes involving vulnerable traffic groups such as pedestrians, play an important role in the safety performance evaluation of autonomous driving systems. However, this type of scenario generally has a low probability of occurring in the real world, and such critical scene data have long-tail distribution problems, causing data-driven autonomous driving risk perception to face technical bottlenecks. Second, creating new scenes using current scene generation methods or virtual simulation scene automatic generation frameworks based on certain rules is difficult, and the generated driving scenes are often insufficiently realistic and lack a certain degree of diversity. By contrast, the scene generation method based on the diffusion model not only fully explores the characteristics of real data and supplements the gaps in the existing collected real data, but also facilitates interpretable and controllable scene generation. In addition, the difficult problem of limited system risk perception capabilities is still encountered in safety-critical scenarios. For risk-aware safety assessment technology, traditional methods based on convolutional neural networks can achieve the simple extraction of features of each object in the scene but cannot obtain high-level semantic information, that is, the relationship between various traffic entities. Obtaining such high-level information is still challenging because most potential risks are hidden at the semantic and behavioral levels. Autonomous driving risk assessment based on traffic scene graphs has become a popular research topic in recent years. Potential risks can be effectively understood and predicted by constructing and analyzing traffic scene graphs and capturing the overall relationships and interactions in the traffic scene, providing a basis for autonomous driving. The system delivers highly accurate decision support. Starting from the visual perception of human drivers, different traffic entities have various risk impacts on autonomous vehicles. However, risk perception methods based on traffic scene graphs generally use graph convolution to iteratively update the feature representation of each node. This method ignores the importance of different types of edges between nodes during message transmission. Considering these challenges and difficulties, this paper proposes a risk-enhanced perception framework based on the parallel vision to realize the automatic generation of safety-critical scene data and examines the importance of different types of edges between adjacent traffic entities.
Method
2
This method is based on the interactive ACP theory and integrates descriptive, prescriptive, and predictive intelligence under a parallel vision framework to achieve vision-based enhanced risk perception. Specifically, based on descriptive and prescriptive learning, a background adaptive module and a feature fusion encoder are first introduced into the diffusion model structure, thereby reducing the boundary contours of pedestrians and improving image quality. The controllable generation of risk sequences in safety-critical scenarios can be achieved by controlling the specific locations where dangerous elements, such as pedestrians, are generated. Second, a cognitive scene graph construction method based on spatial rules is used to obtain the spatial position of each entity in the scene through target detection. Based on the spatial relative position information and setting relevant threshold information, the distance, orientation, and affiliation relationships between entities in the traffic scene are extracted. The extraction of interactive relationships is mainly based on the changes in spatial information between traffic entities over time. Finally, under the predictive learning framework, a new graph model-based risk enhancement perception method, which integrates the relational graph attention network and the Transformer encoder module, is proposed to perform spatiotemporal modeling of scene graph sequence data. The relational graph attention network (RGAT) introduces an attention mechanism, assigns different weight values to different neighborhood relationships, and obtains the feature representation of nodes through weighted summation. The temporal Transformer encoder module is used to model the temporal dynamics of scene graph sequence data, ultimately outputting risk-aware visual reasoning results.
Result
2
Experiments were conducted on three datasets (MRSG-144, IESG, and 1043-carla-sg) to compare the performance with five mainstream risk perception methods based on graph-structured data and verify the effectiveness of the proposed method. This method achieved F1-score values of 0.956, 0.944, and 0.916 on the three datasets, surpassing the existing mainstream methods and achieving optimal results. Additionally, ablation experiments revealed the contributions of each module to the performance of the model. The introduction of virtual scene data notably boosted the performance of the risk perception model, revealing an increase in accuracy, area under curve, and F1-score by 0.4%, 1.1%, and 1.2%, respectively.
Conclusion
2
This article is a practical application of parallel vision in the field of autonomous driving risk perception, which holds considerable importance in enhancing the risk perception capabilities of autonomous vehicles in complex traffic scenarios and ensuring the safety of autonomous driving systems.
无人驾驶平行视觉认知场景图扩散生成风险感知
autonomous drivingparallel visioncognitive scene graphdiffusion generationrisk perception
Arnab A, Dehghani M, Heigold G, Sun C, Lučić M and Schmid C. 2021. ViViT: a video vision Transformer//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE: 6816-6826 [DOI: 10.1109/ICCV48922.2021.00676http://dx.doi.org/10.1109/ICCV48922.2021.00676]
Bao N R, Carballo A, Miyajima C, Takeuchi E and Takeda K. 2020. Personalized subjective driving risk: analysis and prediction. Journal of Robotics and Mechatronics, 32(3): 503-519 [DOI: 10.20965/jrm.2020.p0503http://dx.doi.org/10.20965/jrm.2020.p0503]
Beglerovic H, Stolz M and Horn M. 2017. Testing of autonomous vehicles using surrogate models and stochastic optimization//Proceedings of the 20th IEEE International Conference on Intelligent Transportation Systems (ITSC). Yokohama, Japan: IEEE: 1-6 [DOI: 10.1109/ITSC.2017.8317768http://dx.doi.org/10.1109/ITSC.2017.8317768]
Busbridge D, Sherburn D, Cavallo P and Hammerla N Y. 2019. Relational graph attention networks [EB/OL]. [2023-10-17]. https://arxiv.org/pdf/1904.05811.pdfhttps://arxiv.org/pdf/1904.05811.pdf
Ding M Y, Lan X G, Peng R and Zheng N N. 2021. Progress and prospect of machine reasoning. Pattern Recognition and Artificial Intelligence, 34(1): 1-13
丁梦远, 兰旭光, 彭茹, 郑南宁. 2021. 机器推理的进展与展望. 模式识别与人工智能, 34(1): 1-13 [DOI: 10.16451/j.cnki.issn1003-6059.202101001http://dx.doi.org/10.16451/j.cnki.issn1003-6059.202101001]
Ding W H, Xu M D and Zhao D. 2020. CMTS: a conditional multiple trajectory synthesizer for generating safety-critical driving scenarios//Proceedings of 2020 IEEE International Conference on Robotics and Automation (ICRA). Paris, France: IEEE: 4314-4321 [DOI: 10.1109/ICRA40945.2020.9197145http://dx.doi.org/10.1109/ICRA40945.2020.9197145]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2021. An image is worth16 × 16 words: Transformers for image recognition at scale[EB/OL]. [2023-10-17]. http://arxiv.org/pdf/2010.11929.pdfhttp://arxiv.org/pdf/2010.11929.pdf
Ehrhardt S, Groth O, Monszpart Á, Engelcke M, Posner I, Mitra N J and Vedaldi A. 2020. RELATE: physically plausible multi-object scene synthesis using structured latent spaces//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 11202-11213
Fu Y C, Li C L, Luan T H, Zhang Y and Mao G Q. 2018. Infrastructure-cooperative algorithm for effective intersection collision avoidance. Transportation Research Part C: Emerging Technologies, 89: 188-204 [DOI: 10.1016/j.trc.2018.02.003http://dx.doi.org/10.1016/j.trc.2018.02.003]
Gou C, Zhou Y C and Li D. 2022. Driver attention prediction based on convolution and Transformers. The Journal of Supercomputing, 78(6): 8268-8284 [DOI: 10.1007/s11227-021-04151-2http://dx.doi.org/10.1007/s11227-021-04151-2]
Ho J, Jain A and Abbeel P. 2020. Denoising diffusion probabilistic models//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 6840-6851
Klischat M, Liu E I, Holtke F and Althoff M. 2020. Scenario factory: creating safety-critical traffic scenarios for automated vehicles//Proceedings of the 23rd IEEE International Conference on Intelligent Transportation Systems (ITSC). Rhodes, Greece: IEEE: 1-7 [DOI: 10.1109/ITSC45102.2020.9294629http://dx.doi.org/10.1109/ITSC45102.2020.9294629]
Lee J, Lee I and Kang J. 2019. Self-attention graph pooling[EB/OL]. [2023-10-17]. http://arxiv.org/pdf/1904.08082.pdfhttp://arxiv.org/pdf/1904.08082.pdf
Lefèvre S, Vasquez D and Laugier C. 2014. A survey on motion prediction and risk assessment for intelligent vehicles. ROBOMECH Journal, 1(1): #1 [DOI: 10.1186/s40648-014-0001-zhttp://dx.doi.org/10.1186/s40648-014-0001-z]
Li C X, Chan S H and Chen Y T. 2020a. Who make drivers stop? Towards driver-centric risk assessment: risk object identification via causal inference//Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, USA: IEEE: 10711-10718 [DOI: 10.1109/IROS45743.2020.9341072http://dx.doi.org/10.1109/IROS45743.2020.9341072]
Li C X, Meng Y, Chan S H and Chen Y T. 2020b. Learning 3D-aware egocentric spatial-temporal interaction via graph convolutional networks//Proceedings of 2020 IEEE International Conference on Robotics and Automation (ICRA). Paris, France: IEEE: 8418-8424 [DOI: 10.1109/ICRA40945.2020.9197057http://dx.doi.org/10.1109/ICRA40945.2020.9197057]
Li X, Wang K F, Tian Y L, Yan L, Deng F and Wang F Y. 2019. The ParallelEye dataset: a large collection of virtual images for traffic vision research. IEEE Transactions on Intelligent Transportation Systems, 20(6): 2072-2084 [DOI: 10.1109/TITS.2018.2857566http://dx.doi.org/10.1109/TITS.2018.2857566]
Liu X X, Zhou Y C and Gou C. 2023. Learning from interaction-enhanced scene graph for pedestrian collision risk assessment. IEEE Transactions on Intelligent Vehicles, 8(9): 4237-4248 [DOI: 10.1109/TIV.2023.3309274http://dx.doi.org/10.1109/TIV.2023.3309274]
Malawade A V, Yu S Y, Hsu B, Kaeley H, Karra A and Al Faruque M A. 2022. Roadscene2vec: a tool for extracting and embedding road scene-graphs. Knowledge-Based Systems, 242: #108245 [DOI: 10.1016/j.knosys.2022.108245http://dx.doi.org/10.1016/j.knosys.2022.108245]
Mildenhall B, Srinivasan P P, Tancik M, Barron J T, Ramamoorthi R and Ng R. 2021. NeRF: representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1): 99-106 [DOI: 10.1145/3503250http://dx.doi.org/10.1145/3503250]
Mylavarapu S, Sandhu M, Vijayan P, Krishna K M, Ravindran B and Namboodiri A. 2020. Towards accurate vehicle behaviour classification with multi-relational graph convolutional networks//2020 IEEE Intelligent Vehicles Symposium (IV). Las Vegas, USA: IEEE: 321-327 [DOI: 10.1109/IV47402.2020.9304822http://dx.doi.org/10.1109/IV47402.2020.9304822]
Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149 [DOI: 10.1109/TPAMI.2016.2577031http://dx.doi.org/10.1109/TPAMI.2016.2577031]
Rombach R, Blattmann A, Lorenz D, Esser P and Ommer B. 2022. High-resolution image synthesis with latent diffusion models[EB/OL]. [2023-09-02]. https://arxiv.org/pdf/2112.10752.pdfhttps://arxiv.org/pdf/2112.10752.pdf
Ronneberger O, Fischer P and Brox T. 2015. U-net: convolutional networks for biomedical image segmentation// Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Schlichtkrull M, Kipf T N, Bloem P, Van den Berg R, Titov I and Welling M. 2018. Modeling relational data with graph convolutional networks//Proceedings of the 15th International Conference on the Semantic Web. Heraklion, Greece: Springer: 593-607 [DOI: 10.1007/978-3-319-93417-4_38http://dx.doi.org/10.1007/978-3-319-93417-4_38]
Shen X and Raksincharoensak P. 2022. Pedestrian-aware statistical risk assessment. IEEE Transactions on Intelligent Transportation Systems, 23(7): 7910-7918 [DOI: 10.1109/TITS.2021.3074522http://dx.doi.org/10.1109/TITS.2021.3074522]
Tuncali C E, Pavlic T P and Fainekos G. 2016. Utilizing S-TaLiRo as an automatic test generation framework for autonomous vehicles//Proceedings of the 19th IEEE International Conference on Intelligent Transportation Systems (ITSC). Rio de Janeiro, Brazil: IEEE: 1470-1475 [DOI: 10.1109/ITSC.2016.7795751http://dx.doi.org/10.1109/ITSC.2016.7795751]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2023. Attention is all you need[EB/OL]. [2023-09-02]. http://arxiv.org/pdf/1706.03762.pdfhttp://arxiv.org/pdf/1706.03762.pdf
Veličković P, Cucurull G, Casanova A, Romero A, Liò P and Bengio Y. 2018. Graph attention networks [EB/OL]. [2023-10-17]. http://arxiv.org/pdf/1710.10903.pdfhttp://arxiv.org/pdf/1710.10903.pdf
Wang F Y. 2004. Parallel system methods for management and control of complex systems. Control and Decision, 19(5): 485-489, 514
王飞跃. 2004. 平行系统方法与复杂系统的管理和控制. 控制与决策, 19(5): 485-489, 514 [DOI: 10.13195/j.cd.2004.05.6.wangfy.002http://dx.doi.org/10.13195/j.cd.2004.05.6.wangfy.002]
Wang J G, Wang X, Shen T Y, Wang Y T, Li L, Tian Y L, Yu H, Chen L, Xin J M, Wu X B, Zheng N N and Wang F Y. 2022. Parallel vision for long-tail regularization: initial results from IVFC autonomous driving testing. IEEE Transactions on Intelligent Vehicles, 7(2): 286-299 [DOI: 10.1109/TIV.2022.3145035http://dx.doi.org/10.1109/TIV.2022.3145035]
Wang K F, Gou C and Wang F Y. 2016. Parallel vision: an ACP-based approach to intelligent vision computing. Acta Automatica Sinica, 42(10): 1490-1500
王坤峰, 苟超, 王飞跃. 2016. 平行视觉: 基于ACP的智能视觉计算方法. 自动化学报, 42(10): 1490-1500 [DOI: 10.16383/j.aas.2016.c160604http://dx.doi.org/10.16383/j.aas.2016.c160604]
Wang P and Chan C Y. 2017. Vehicle collision prediction at intersections based on comparison of minimal distance between vehicles and dynamic thresholds. IET Intelligent Transport Systems, 11(10): 676-684 [DOI: 10.1049/iet-its.2017.0065http://dx.doi.org/10.1049/iet-its.2017.0065]
Wei M. 2024. Bibliometric and visual knowledge graph analysis of road traffic accident influencing factors in China based on CiteSpace. Safety and Environmental Engineering, 31(2): 17-33
韦漠. 2024. 基于CiteSpace的我国道路交通事故影响因素文献计量与可视化知识图谱分析. 安全与环境工程, 31(2): 17-33 [DOI: 10.13578/j.cnki.issn.1671-1556.20230423http://dx.doi.org/10.13578/j.cnki.issn.1671-1556.20230423]
Wu Y X, Kirillov A, Massa F, Lo W Y and Girshick R. 2019. Detectron2 [EB/OL]. [2023-10-17]. https://github.com/facebookresearch/detectron2https://github.com/facebookresearch/detectron2
Xu D F, Zhu Y K, Choy C B and Li F F. 2017. Scene graph generation by iterative message passing//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 3097-3106 [DOI: 10.1109/cvpr.2017.330http://dx.doi.org/10.1109/cvpr.2017.330]
Yang Z P, Chai Y N, Anguelov D, Zhou Y, Sun P, Erhan D, Rafferty S and Kretzschmar H. 2020. SurfelGAN: synthesizing realistic sensor data for autonomous driving//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 11115-11124 [DOI: 10.1109/CVPR42600.2020.01113http://dx.doi.org/10.1109/CVPR42600.2020.01113]
Yu S Y, Malawade A V, Muthirayan D, Khargonekar P P and Al Faruque M A. 2022. Scene-graph augmented data-driven risk assessment of autonomous vehicle decisions. IEEE Transactions on Intelligent Transportation Systems, 23(7): 7941-7951 [DOI: 10.1109/TITS.2021.3074854http://dx.doi.org/10.1109/TITS.2021.3074854]
Yurtsever E, Liu Y K, Lambert J, Miyajima C, Takeuchi E, Takeda K and Hansen J H L. 2019. Risky action recognition in lane change video clips using deep spatiotemporal networks with segmentation mask transfer//Proceedings of 2019 IEEE Intelligent Transportation Systems Conference (ITSC). Auckland, New Zealand: IEEE: 3100-3107 [DOI: 10.1109/ITSC.2019.8917362http://dx.doi.org/10.1109/ITSC.2019.8917362]
Zhang H, Li X and Wang F Y. 2021. The basic framework and key algorithms of parallel vision. Journal of Image and Graphics, 26(1): 82-92
张慧, 李轩, 王飞跃. 2021. 平行视觉的基本框架与关键算法. 中国图象图形学报, 26(1): 82-92 [DOI: 10.11834/jig.200400http://dx.doi.org/10.11834/jig.200400]
Zhang L M, Rao A Y and Agrawala M. 2023. Adding conditional control to text-to-image diffusion models [EB/OL]. [2023-09-02]. https://arxiv.org/pdf/2302.05543.pdfhttps://arxiv.org/pdf/2302.05543.pdf
Zhang S S, Benenson R and Schiele B. 2017. CityPersons: a diverse dataset for pedestrian detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 4457-4465 [DOI: 10.1109/CVPR.2017.474http://dx.doi.org/10.1109/CVPR.2017.474]
Zhou Y C, Tan G, Li M T and Gou C. 2023a. Learning from easy to hard pairs: multi-step reasoning network for human-object interaction detection//Proceedings of the 31st ACM International Conference on Multimedia. Ottawa, Canada: ACM: 4368-4377 [DOI: 10.1145/3581783.3612581http://dx.doi.org/10.1145/3581783.3612581]
Zhou Y C, Tan G, Zhong R, Li Y K and Gou C. 2023b. PIT: progressive interaction Transformer for pedestrian crossing intention prediction. IEEE Transactions on Intelligent Transportation Systems, 24(12): 14213-14225 [DOI: 10.1109/TITS.2023.3309309http://dx.doi.org/10.1109/TITS.2023.3309309]
Zhou Y C, Zhang Y, Zhao Z W, Zhang K D and Gou C. 2022. Toward driving scene understanding: a paradigm and benchmark dataset for ego-centric traffic scene graph representation. IEEE Journal of Radio Frequency Identification, 6: 962-967 [DOI: 10.1109/JRFID.2022.3207017http://dx.doi.org/10.1109/JRFID.2022.3207017]
相关作者
相关机构