强化学习的自动驾驶控制技术研究进展

潘峰; 鲍泓

doi:10.11834/jig.200428

综述 | 浏览量 : 0 下载量: 1 CSCD: 3

PDF
导出
分享
收藏
专辑

强化学习的自动驾驶控制技术研究进展
Research progress of automatic driving control technology based on reinforcement learning
2021年26卷第1期页码：28-35
纸质出版日期： 2021-01-16 ，

录用日期： 2020-10-30
DOI： 10.11834/jig.200428
稿件说明：

移动端阅览

潘峰, 鲍泓. 强化学习的自动驾驶控制技术研究进展[J]. 中国图象图形学报, 2021,26(1):28-35.

Feng Pan, Hong Bao. Research progress of automatic driving control technology based on reinforcement learning[J]. Journal of Image and Graphics, 2021,26(1):28-35.
潘峰, 鲍泓. 强化学习的自动驾驶控制技术研究进展[J]. 中国图象图形学报, 2021,26(1):28-35. DOI： 10.11834/jig.200428.

Feng Pan, Hong Bao. Research progress of automatic driving control technology based on reinforcement learning[J]. Journal of Image and Graphics, 2021,26(1):28-35. DOI： 10.11834/jig.200428.

摘要

自动驾驶车辆的本质是轮式移动机器人，是一个集模式识别、环境感知、规划决策和智能控制等功能于一体的综合系统。人工智能和机器学习领域的进步极大推动了自动驾驶技术的发展。当前主流的机器学习方法分为：监督学习、非监督学习和强化学习3种。强化学习方法更适用于复杂交通场景下自动驾驶系统决策和控制的智能处理，有利于提高自动驾驶的舒适性和安全性。深度学习和强化学习相结合产生的深度强化学习方法成为机器学习领域中的热门研究方向。首先对自动驾驶技术、强化学习方法以及自动驾驶控制架构进行简要介绍，并阐述了强化学习方法的基本原理和研究现状。随后重点阐述了强化学习方法在自动驾驶控制领域的研究历史和现状，并结合北京联合大学智能车研究团队的研究和测试工作介绍了典型的基于强化学习的自动驾驶控制技术应用，讨论了深度强化学习的潜力。最后提出了强化学习方法在自动驾驶控制领域研究和应用时遇到的困难和挑战，包括真实环境下自动驾驶安全性、多智能体强化学习和符合人类驾驶特性的奖励函数设计等。研究有助于深入了解强化学习方法在自动驾驶控制方面的优势和局限性，在应用中也可作为自动驾驶控制系统的设计参考。

Abstract

Research on fully automatic driving has been largely spurred by some important international challenges and competitions

such as the well-known Defense Advanced Research Projects Agency Grand Challenge held in 2005. Self-driving cars and autonomous vehicles have migrated from laboratory development and testing conditions to driving on public roads. Self-driving cars are autonomous decision-making systems that process streams of observations coming from different on-board sources

such as cameras

radars

lidars

ultrasonic sensors

global positioning system units

and/or inertial sensors. The development of autonomous vehicles offers a decrease in road accidents and traffic congestions. Most driving scenarios can be simply solved with classical perception

path planning

and motion control methods. However

the remaining unsolved scenarios are corner cases where traditional methods fail. In the past decade

advances in the field of artificial intelligence (AI) and machine learning (ML) have greatly promoted the development of autonomous driving. Autonomous driving is a challenging application domain for ML. ML methods can be divided into supervised learning

unsupervised learning

and reinforcement learning (RL). RL is a family of algorithms that allow agents to learn how to act in different situations. In other words

a map or a policy is established from situations (states) to actions to maximize a numerical reward signal. Most autonomous vehicles have a modular hierarchical structure and can be divided into four components or four layers

namely

perception

decision making

control

and actuator. RL is suitable for decision making and control in complex traffic scenarios to improve the safety and comfort of autonomous driving. Traditional controllers utilize an a priori model composed of fixed parameters. When robots or other autonomous systems are used in complex environments

such as driving

traditional controllers cannot foresee every possible situation that the system has to cope with. An RL controller is a learning controller and uses training information to learn their models over time. With every gathered batch of training data

the approximation of the true system model becomes accurate. Deep neural networks have been applied as function approximators for RL agents

thereby allowing agents to generalize knowledge to new unseen situations

along with new algorithms for problems with continuous state and action spaces. This paper mainly introduces the current status and progress of the application of RL methods in autonomous driving control. This paper consists of five sections. The first section introduces the background of autonomous driving and some basic knowledge about ML and RL. The second section briefly describes the architecture of autonomous driving framework. The control layer is an important part of an autonomous vehicle and has always been a key area of autonomous driving technology research. The control system of autonomous driving mainly includes lateral control and longitudinal control

namely

steering control and velocity control. Lateral control deals with the path tracking problem

and longitudinal control deals with the problem of tracking the reference speed and keeping a safe distance from the preceding vehicle. The third section introduces the basic principles of RL methods and focuses on the current research status of RL in autonomous driving control. RL algorithms are based on Markov decision process and aim to learn mapping from situations to actions to maximize a scalar reward or reinforcement signal. RL is a new and extremely old topic in AI. It gradually became an active and identifiable area of ML in 1980 s. Q-learning is a widely used RL algorithm. However

it is based on tabular setting and can only deal with those problems with low dimension and discrete state/action spaces. A primary goal of AI is to solve complex tasks from unprocessed

high-dimensional

sensory input. Significant progress has been made by combining deep learning for sensory processing with RL

resulting in the "deep Q network" (DQN) algorithm that is capable of human-level performance on many Atari video games using unprocessed pixels for input. However

DQN can only handle discrete and low-dimensional action spaces. Deep deterministic policy gradient was proposed to handle those problems with continuous state/action spaces. It can learn policies directly from raw pixel inputs. The fourth section generalizes some typical applications of RL algorithm in autonomous driving

including some studies of our team. Unlike supervised learning

RL is more suitable for decision making and control of autonomous driving. Most of the RL algorithms used in autonomous driving mostly combine deep learning and use raw pixels as input to achieve end-to-end control. The last section discusses the challenges encountered in the application of RL algorithms in autonomous driving control. The first challenge is how to deploy the RL model trained on a simulator to run in a real environment and ensure safety. The second challenge is the RL problem in an environment with multiple participants. Multiagent RL is a direction of RL development

but training multiagents is more complicated than training a single agent. The third challenge is how to train an agent with a reasonable reward function. In most RL settings

we typically assume that a reward function is given

but this is not always the case. Imitation learning and reverse RL provide an effective solution for obtaining the real reward function that makes the performance of the agent close to a human. This article helps to understand the advantages and limitations of RL methods in autonomous driving control

the potential of deep RL

and can serve as reference for the design of automatic driving control systems.

关键词

自动驾驶决策控制马尔可夫决策过程强化学习数据驱动自主学习

Keywords

autonomous drivingdecision controlMarkov decision process(MDP)reinforcement learning(RL)data-drivenautonomous learning

references

Abakuks A. 1987. Reviewed work:Markov Processes:characterization and convergence. By S. N. Ethier, T. G. Kurtz. Biometrics, 43(2):113-122[DOI:10.2307/2531839]

Abbeel P and Ng A Y. 2004. Apprenticeship learning via inverse reinforcement learning//Proceedings of the 21st International Conference on Machine Learning. Banff, Canada: ACM: 1-8[DOI: 10.1007/978-0-387-30164-8_417http://dx.doi.org/10.1007/978-0-387-30164-8_417]

Amit R and Matari M. 2002. Learning movement sequences from demonstration//Proceedings of the 2nd International Conference on Development and Learning. Cambridge, USA: IEEE: 203-208[DOI: 10.1109/DEVLRN.2002.1011867http://dx.doi.org/10.1109/DEVLRN.2002.1011867]

Bartlett P L. 2003. An introduction to reinforcement learning theory: value function methods//Advanced Lectures on Machine Learning, Machine Learning Summer School 2002. Canberra, Australia: Springer: 184-202[DOI: 10.1007/3-540-36434-X_5http://dx.doi.org/10.1007/3-540-36434-X_5]

Barto A G and Sutton R S. 1981. Landmark learning:an illustration of associative search. Biological Cybernetics, 42(1):1-8[DOI:10.1007/BF00335152]

Barto A G, Sutton R S and Anderson C W. 1970. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13(5):834-846[DOI:10.1109/TSMC.1983.6313077]

Busoniu L, Babuska R and de Schutter B. 2006. Multi-agent reinforcement learning: a survey//Proceedings of the 9th International Conference on Control, Automation, Robotics and Vision. Singapore, Singapore: IEEE: 1-6[DOI: 10.1109/ICARCV.2006.345353http://dx.doi.org/10.1109/ICARCV.2006.345353]

Chae H, Kang C M, Kim B D, Kim J, Chung C C and Choi J W. 2018. Autonomous braking system via deep reinforcement learning//Proceedings of the 20th IEEE International Conference on Intelligent Transportation Systems (ITSC). Yokohama, Japan: IEEE: 1-6[DOI: 10.1109/ITSC.2017.8317839http://dx.doi.org/10.1109/ITSC.2017.8317839]

Ferdowsi A, Challita U, Saad W and Mandayam N B. 2018. Robust deep reinforcement learning for security and safety in autonomous vehicle systems//Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC). Maui, USA: IEEE: 307-312[DOI: 10.1109/ITSC.2018.8569635http://dx.doi.org/10.1109/ITSC.2018.8569635]

Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P and Levine S. 2018. Soft actor-critic algorithms and applications[EB/OL].[2020-06-30].https://arxiv.org/pdf/1812.05905.pdfhttps://arxiv.org/pdf/1812.05905.pdf

Han X M, Bao H, Liang J, Pan F and Xuan Z X. 2018. An adaptive cruise control algorithm based on deep reinforcement learning. Computer Engineering, 44(7):32-35, 41

韩向敏, 鲍泓, 梁军, 潘峰, 玄祖兴. 2018.一种基于深度强化学习的自适应巡航控制算法.计算机工程, 44(7):32-35, 41)[DOI:10.19678/j.issn.1000-3428.0050994]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

Hu J L and Wellman M P. 2003. Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research, 4:1039-1069

Kaelbling L P, Littman M L and Moore A W. 1996. Reinforcement learning:a survey. Journal of Artificial Intelligence Research, 4(1):237-285

Konda V R and Tsitsiklis J N. 2003. OnActor-critic algorithms. SIAM Journal on Control and Optimization, 42(4):1143-1166[DOI:10.1137/S0363012901385691]

Lange S, Riedmiller M and Voigtlander A. 2012. Autonomous reinforcement learning on raw visual input data in a real world application//Proceedings of 2012 International Joint Conference on Neural Networks. Brisbane, Australia: IEEE: 1-8[DOI: 10.1109/IJCNN.2012.6252823http://dx.doi.org/10.1109/IJCNN.2012.6252823]

Li D Y. 2015. Formalization of Brain cognition-start with the development of a robotic driving brain. Science and Technology Review, 33(24):#125

李德毅. 2015.脑认知的形式化——从研发机器驾驶脑谈开去.科技导报, 33(24):#125

Liang X D, Wang T R, Yang L N and Xing E. 2018. CIRL: controllable imitative reinforcement learning for vision-based self-driving//Proceedings of the 15th European Conference on European Conference on Computer Vision. Munich, Germany: Springer: 584-599[DOI: 10.1007/978-3-030-01234-2_36http://dx.doi.org/10.1007/978-3-030-01234-2_36]

Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, Silver D and Wierstra D. 2015. Continuous control with deep reinforcement learning[EB/OL].[2020-06-30].https://arxiv.org/pdf/1509.02971.pdfhttps://arxiv.org/pdf/1509.02971.pdf

Littman M L. 1994. Markov games as a framework for multi-agent reinforcement learning//Proceedings of the 11th International Conference on International Conference on Machine Learning. New Brunswick, USA: IEEE: 157-163

Minsky M. 1961. Steps toward artificial intelligence. Proceedings of the IRE, 49(1):8-30[DOI:10.1109/JRPROC.1961.287775]

Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S and Hassabis D. 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533[DOI:10.1038/nature14236]

Ostafew C J. 2016. Learning-based Control for Autonomous Mobile Robots. Toronto: University of Toronto

Ostafew C J, Schoellig A P, Barfoot T D and Collier J. 2016. Learning-based nonlinear model predictive control to improve vision-based mobile robot path tracking. Journal of Field Robotics, 33(1):133-152[DOI:10.1002/rob.21587]

Paden B, Čáp M, Yong S Z, Yershov D and Frazzoli E. 2016. A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Transactions on Intelligent Vehicles, 1(1):33-55[DOI:10.1109/TIV.2016.2578706]

Pan F and Bao H. 2019. Reinforcement learning model with a reward function based on human driving characteristics//Proceedings of the 15th International Conference on Computational Intelligence and Security. Macao, China: IEEE: 225-229[DOI: 10.1109/CIS.2019.00055http://dx.doi.org/10.1109/CIS.2019.00055]

Pinto L, Davidson J, Sukthankar R and Gupta A. 2017. Robust adversarial reinforcement learning//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: ACM: 2817-2826

Rajamani R. 2012. Vehicle Dynamics and Control. Boston, MA: Springer[DOI: 10.1007/0-387-28823-6http://dx.doi.org/10.1007/0-387-28823-6]

Sallab A E, Abdou M, Perot E and Yogamani S. 2016. End-to-end deep reinforcement learning for lane keeping assist[EB/OL].[2020-06-30].https://arxiv.org/pdf/1612.04340.pdfhttps://arxiv.org/pdf/1612.04340.pdf

Samuel A L. 1959. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3):210-229[DOI:10.1147/rd.33.0210]

Shalev-Shwartz S, Ben-Zrihem N, Cohen A and Shashua A. 2016. Long-term planning by short-term prediction[EB/OL].[2020-06-30].https://arxiv.org/pdf/1602.01580.pdfhttps://arxiv.org/pdf/1602.01580.pdf

Silver D, Huang A, Maddison C J, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T and Hassabis D. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484-489[DOI:10.1038/nature16961]

Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K and Hassabis D. 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm[EB/OL].[2020-06-30].https://arxiv.org/pdf/1712.01815.pdfhttps://arxiv.org/pdf/1712.01815.pdf

Snider J M. 2009. Automatic Steering Methods for Autonomous Automobile Path Tracking. Robotics Institute

Spryn M, Sharma A, Parkar D and Shrimal M. 2018. Distributed deep reinforcement learning on the cloud for autonomous driving//Proceedings of the 1 st International Workshop on Software Engineering for AI in Autonomous Systems. Gothenburg, Sweden: ACM: 16-22[DOI: 10.1145/3194085.3194088http://dx.doi.org/10.1145/3194085.3194088]

Sutton R S. 1992. Introduction:The challenge of reinforcement learning. Machine Learning, 8(3):225-227[DOI:10.1023/A:1022620604568]

Sutton R S and Barto A G. 1998. Introduction to Reinforcement Learning. Cambridge:MIT Press

Watkins C J C H and Dayan P.1992. Q-learning. Machine Learning, (3-4):279-292

Waltz M and Fu K. 1965. A heuristic approach to reinforcement learning control systems. IEEE Transactions on Automatic Control, 10(4):390-398[DOI:10.1109/TAC.1965.1098193]

Watldns C J C H. 1989. Learning from Delayed Rewards. Cambridge: Cambridge University: 233-235

Xu Y C, Wang R B, Li B and Li B. 2001. A summary of worldwide intelligent vehicle. Automotive Engineering, 23(5):289-295

徐友春, 王荣本, 李兵, 李斌. 2001.世界智能车辆近况综述.汽车工程, 23(5):289-295)[DOI:10.19562/j.chinasae.qcgc.2001.05.001]

Yang L N, Liang X D, Wang T R and Xing E P. 2018. Real-to-virtual domain unification for end-to-end autonomous driving//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 553-570

Yang S. 2019. From Virtuality to Reality: Research on Deep Reinforcement Learning Based Autonomous Vehicle Control. Changchun: Jilin University

杨顺. 2019.从虚拟到现实的智能车辆深度强化学习控制研究.长春: 吉林大学

Zong X P, Xu G Y, Yu G Z, Su H J and Hu C W. 2018. Obstacle avoidance for self-driving vehicle with reinforcement learning. SAE International Journal of Passenger Cars-Electronic and Electrical Systems, 11(1):30-39[DOI:10.4271/07-11-01-0003]

文章被引用时，请邮件提醒。

提交

融合点云深度信息的3D目标检测与分类

融合点云与图像的环境目标检测研究进展

基于图像的自动驾驶3D目标检测综述——基准、制约因素和误差分析

面向智慧交通的图像处理与边缘计算

单目相机轨迹的真实尺度恢复