强化学习的自动驾驶控制技术研究进展
Research progress of automatic driving control technology based on reinforcement learning
- 2021年26卷第1期 页码:28-35
纸质出版日期: 2021-01-16 ,
录用日期: 2020-10-30
DOI: 10.11834/jig.200428
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2021-01-16 ,
录用日期: 2020-10-30
移动端阅览
潘峰, 鲍泓. 强化学习的自动驾驶控制技术研究进展[J]. 中国图象图形学报, 2021,26(1):28-35.
Feng Pan, Hong Bao. Research progress of automatic driving control technology based on reinforcement learning[J]. Journal of Image and Graphics, 2021,26(1):28-35.
自动驾驶车辆的本质是轮式移动机器人,是一个集模式识别、环境感知、规划决策和智能控制等功能于一体的综合系统。人工智能和机器学习领域的进步极大推动了自动驾驶技术的发展。当前主流的机器学习方法分为:监督学习、非监督学习和强化学习3种。强化学习方法更适用于复杂交通场景下自动驾驶系统决策和控制的智能处理,有利于提高自动驾驶的舒适性和安全性。深度学习和强化学习相结合产生的深度强化学习方法成为机器学习领域中的热门研究方向。首先对自动驾驶技术、强化学习方法以及自动驾驶控制架构进行简要介绍,并阐述了强化学习方法的基本原理和研究现状。随后重点阐述了强化学习方法在自动驾驶控制领域的研究历史和现状,并结合北京联合大学智能车研究团队的研究和测试工作介绍了典型的基于强化学习的自动驾驶控制技术应用,讨论了深度强化学习的潜力。最后提出了强化学习方法在自动驾驶控制领域研究和应用时遇到的困难和挑战,包括真实环境下自动驾驶安全性、多智能体强化学习和符合人类驾驶特性的奖励函数设计等。研究有助于深入了解强化学习方法在自动驾驶控制方面的优势和局限性,在应用中也可作为自动驾驶控制系统的设计参考。
Research on fully automatic driving has been largely spurred by some important international challenges and competitions
such as the well-known Defense Advanced Research Projects Agency Grand Challenge held in 2005. Self-driving cars and autonomous vehicles have migrated from laboratory development and testing conditions to driving on public roads. Self-driving cars are autonomous decision-making systems that process streams of observations coming from different on-board sources
such as cameras
radars
lidars
ultrasonic sensors
global positioning system units
and/or inertial sensors. The development of autonomous vehicles offers a decrease in road accidents and traffic congestions. Most driving scenarios can be simply solved with classical perception
path planning
and motion control methods. However
the remaining unsolved scenarios are corner cases where traditional methods fail. In the past decade
advances in the field of artificial intelligence (AI) and machine learning (ML) have greatly promoted the development of autonomous driving. Autonomous driving is a challenging application domain for ML. ML methods can be divided into supervised learning
unsupervised learning
and reinforcement learning (RL). RL is a family of algorithms that allow agents to learn how to act in different situations. In other words
a map or a policy is established from situations (states) to actions to maximize a numerical reward signal. Most autonomous vehicles have a modular hierarchical structure and can be divided into four components or four layers
namely
perception
decision making
control
and actuator. RL is suitable for decision making and control in complex traffic scenarios to improve the safety and comfort of autonomous driving. Traditional controllers utilize an a priori model composed of fixed parameters. When robots or other autonomous systems are used in complex environments
such as driving
traditional controllers cannot foresee every possible situation that the system has to cope with. An RL controller is a learning controller and uses training information to learn their models over time. With every gathered batch of training data
the approximation of the true system model becomes accurate. Deep neural networks have been applied as function approximators for RL agents
thereby allowing agents to generalize knowledge to new unseen situations
along with new algorithms for problems with continuous state and action spaces. This paper mainly introduces the current status and progress of the application of RL methods in autonomous driving control. This paper consists of five sections. The first section introduces the background of autonomous driving and some basic knowledge about ML and RL. The second section briefly describes the architecture of autonomous driving framework. The control layer is an important part of an autonomous vehicle and has always been a key area of autonomous driving technology research. The control system of autonomous driving mainly includes lateral control and longitudinal control
namely
steering control and velocity control. Lateral control deals with the path tracking problem
and longitudinal control deals with the problem of tracking the reference speed and keeping a safe distance from the preceding vehicle. The third section introduces the basic principles of RL methods and focuses on the current research status of RL in autonomous driving control. RL algorithms are based on Markov decision process and aim to learn mapping from situations to actions to maximize a scalar reward or reinforcement signal. RL is a new and extremely old topic in AI. It gradually became an active and identifiable area of ML in 1980 s. Q-learning is a widely used RL algorithm. However
it is based on tabular setting and can only deal with those problems with low dimension and discrete state/action spaces. A primary goal of AI is to solve complex tasks from unprocessed
high-dimensional
sensory input. Significant progress has been made by combining deep learning for sensory processing with RL
resulting in the "deep Q network" (DQN) algorithm that is capable of human-level performance on many Atari video games using unprocessed pixels for input. However
DQN can only handle discrete and low-dimensional action spaces. Deep deterministic policy gradient was proposed to handle those problems with continuous state/action spaces. It can learn policies directly from raw pixel inputs. The fourth section generalizes some typical applications of RL algorithm in autonomous driving
including some studies of our team. Unlike supervised learning
RL is more suitable for decision making and control of autonomous driving. Most of the RL algorithms used in autonomous driving mostly combine deep learning and use raw pixels as input to achieve end-to-end control. The last section discusses the challenges encountered in the application of RL algorithms in autonomous driving control. The first challenge is how to deploy the RL model trained on a simulator to run in a real environment and ensure safety. The second challenge is the RL problem in an environment with multiple participants. Multiagent RL is a direction of RL development
but training multiagents is more complicated than training a single agent. The third challenge is how to train an agent with a reasonable reward function. In most RL settings
we typically assume that a reward function is given
but this is not always the case. Imitation learning and reverse RL provide an effective solution for obtaining the real reward function that makes the performance of the agent close to a human. This article helps to understand the advantages and limitations of RL methods in autonomous driving control
the potential of deep RL
and can serve as reference for the design of automatic driving control systems.
自动驾驶决策控制马尔可夫决策过程强化学习数据驱动自主学习
autonomous drivingdecision controlMarkov decision process(MDP)reinforcement learning(RL)data-drivenautonomous learning
Abakuks A. 1987. Reviewed work:Markov Processes:characterization and convergence. By S. N. Ethier, T. G. Kurtz. Biometrics, 43(2):113-122[DOI:10.2307/2531839]
Abbeel P and Ng A Y. 2004. Apprenticeship learning via inverse reinforcement learning//Proceedings of the 21st International Conference on Machine Learning. Banff, Canada: ACM: 1-8[DOI: 10.1007/978-0-387-30164-8_417http://dx.doi.org/10.1007/978-0-387-30164-8_417]
Amit R and Matari M. 2002. Learning movement sequences from demonstration//Proceedings of the 2nd International Conference on Development and Learning. Cambridge, USA: IEEE: 203-208[DOI: 10.1109/DEVLRN.2002.1011867http://dx.doi.org/10.1109/DEVLRN.2002.1011867]
Bartlett P L. 2003. An introduction to reinforcement learning theory: value function methods//Advanced Lectures on Machine Learning, Machine Learning Summer School 2002. Canberra, Australia: Springer: 184-202[DOI: 10.1007/3-540-36434-X_5http://dx.doi.org/10.1007/3-540-36434-X_5]
Barto A G and Sutton R S. 1981. Landmark learning:an illustration of associative search. Biological Cybernetics, 42(1):1-8[DOI:10.1007/BF00335152]
Barto A G, Sutton R S and Anderson C W. 1970. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13(5):834-846[DOI:10.1109/TSMC.1983.6313077]
Busoniu L, Babuska R and de Schutter B. 2006. Multi-agent reinforcement learning: a survey//Proceedings of the 9th International Conference on Control, Automation, Robotics and Vision. Singapore, Singapore: IEEE: 1-6[DOI: 10.1109/ICARCV.2006.345353http://dx.doi.org/10.1109/ICARCV.2006.345353]
Chae H, Kang C M, Kim B D, Kim J, Chung C C and Choi J W. 2018. Autonomous braking system via deep reinforcement learning//Proceedings of the 20th IEEE International Conference on Intelligent Transportation Systems (ITSC). Yokohama, Japan: IEEE: 1-6[DOI: 10.1109/ITSC.2017.8317839http://dx.doi.org/10.1109/ITSC.2017.8317839]
Ferdowsi A, Challita U, Saad W and Mandayam N B. 2018. Robust deep reinforcement learning for security and safety in autonomous vehicle systems//Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC). Maui, USA: IEEE: 307-312[DOI: 10.1109/ITSC.2018.8569635http://dx.doi.org/10.1109/ITSC.2018.8569635]
Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P and Levine S. 2018. Soft actor-critic algorithms and applications[EB/OL].[2020-06-30].https://arxiv.org/pdf/1812.05905.pdfhttps://arxiv.org/pdf/1812.05905.pdf
Han X M, Bao H, Liang J, Pan F and Xuan Z X. 2018. An adaptive cruise control algorithm based on deep reinforcement learning. Computer Engineering, 44(7):32-35, 41
韩向敏, 鲍泓, 梁军, 潘峰, 玄祖兴. 2018.一种基于深度强化学习的自适应巡航控制算法.计算机工程, 44(7):32-35, 41)[DOI:10.19678/j.issn.1000-3428.0050994]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hu J L and Wellman M P. 2003. Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research, 4:1039-1069
Kaelbling L P, Littman M L and Moore A W. 1996. Reinforcement learning:a survey. Journal of Artificial Intelligence Research, 4(1):237-285
Konda V R and Tsitsiklis J N. 2003. OnActor-critic algorithms. SIAM Journal on Control and Optimization, 42(4):1143-1166[DOI:10.1137/S0363012901385691]
Lange S, Riedmiller M and Voigtlander A. 2012. Autonomous reinforcement learning on raw visual input data in a real world application//Proceedings of 2012 International Joint Conference on Neural Networks. Brisbane, Australia: IEEE: 1-8[DOI: 10.1109/IJCNN.2012.6252823http://dx.doi.org/10.1109/IJCNN.2012.6252823]
Li D Y. 2015. Formalization of Brain cognition-start with the development of a robotic driving brain. Science and Technology Review, 33(24):#125
李德毅. 2015.脑认知的形式化——从研发机器驾驶脑谈开去.科技导报, 33(24):#125
Liang X D, Wang T R, Yang L N and Xing E. 2018. CIRL: controllable imitative reinforcement learning for vision-based self-driving//Proceedings of the 15th European Conference on European Conference on Computer Vision. Munich, Germany: Springer: 584-599[DOI: 10.1007/978-3-030-01234-2_36http://dx.doi.org/10.1007/978-3-030-01234-2_36]
Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, Silver D and Wierstra D. 2015. Continuous control with deep reinforcement learning[EB/OL].[2020-06-30].https://arxiv.org/pdf/1509.02971.pdfhttps://arxiv.org/pdf/1509.02971.pdf
Littman M L. 1994. Markov games as a framework for multi-agent reinforcement learning//Proceedings of the 11th International Conference on International Conference on Machine Learning. New Brunswick, USA: IEEE: 157-163
Minsky M. 1961. Steps toward artificial intelligence. Proceedings of the IRE, 49(1):8-30[DOI:10.1109/JRPROC.1961.287775]
Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S and Hassabis D. 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533[DOI:10.1038/nature14236]
Ostafew C J. 2016. Learning-based Control for Autonomous Mobile Robots. Toronto: University of Toronto
Ostafew C J, Schoellig A P, Barfoot T D and Collier J. 2016. Learning-based nonlinear model predictive control to improve vision-based mobile robot path tracking. Journal of Field Robotics, 33(1):133-152[DOI:10.1002/rob.21587]
Paden B, Čáp M, Yong S Z, Yershov D and Frazzoli E. 2016. A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Transactions on Intelligent Vehicles, 1(1):33-55[DOI:10.1109/TIV.2016.2578706]
Pan F and Bao H. 2019. Reinforcement learning model with a reward function based on human driving characteristics//Proceedings of the 15th International Conference on Computational Intelligence and Security. Macao, China: IEEE: 225-229[DOI: 10.1109/CIS.2019.00055http://dx.doi.org/10.1109/CIS.2019.00055]
Pinto L, Davidson J, Sukthankar R and Gupta A. 2017. Robust adversarial reinforcement learning//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: ACM: 2817-2826
Rajamani R. 2012. Vehicle Dynamics and Control. Boston, MA: Springer[DOI: 10.1007/0-387-28823-6http://dx.doi.org/10.1007/0-387-28823-6]
Sallab A E, Abdou M, Perot E and Yogamani S. 2016. End-to-end deep reinforcement learning for lane keeping assist[EB/OL].[2020-06-30].https://arxiv.org/pdf/1612.04340.pdfhttps://arxiv.org/pdf/1612.04340.pdf
Samuel A L. 1959. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3):210-229[DOI:10.1147/rd.33.0210]
Shalev-Shwartz S, Ben-Zrihem N, Cohen A and Shashua A. 2016. Long-term planning by short-term prediction[EB/OL].[2020-06-30].https://arxiv.org/pdf/1602.01580.pdfhttps://arxiv.org/pdf/1602.01580.pdf
Silver D, Huang A, Maddison C J, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T and Hassabis D. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484-489[DOI:10.1038/nature16961]
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K and Hassabis D. 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm[EB/OL].[2020-06-30].https://arxiv.org/pdf/1712.01815.pdfhttps://arxiv.org/pdf/1712.01815.pdf
Snider J M. 2009. Automatic Steering Methods for Autonomous Automobile Path Tracking. Robotics Institute
Spryn M, Sharma A, Parkar D and Shrimal M. 2018. Distributed deep reinforcement learning on the cloud for autonomous driving//Proceedings of the 1 st International Workshop on Software Engineering for AI in Autonomous Systems. Gothenburg, Sweden: ACM: 16-22[DOI: 10.1145/3194085.3194088http://dx.doi.org/10.1145/3194085.3194088]
Sutton R S. 1992. Introduction:The challenge of reinforcement learning. Machine Learning, 8(3):225-227[DOI:10.1023/A:1022620604568]
Sutton R S and Barto A G. 1998. Introduction to Reinforcement Learning. Cambridge:MIT Press
Watkins C J C H and Dayan P.1992. Q-learning. Machine Learning, (3-4):279-292
Waltz M and Fu K. 1965. A heuristic approach to reinforcement learning control systems. IEEE Transactions on Automatic Control, 10(4):390-398[DOI:10.1109/TAC.1965.1098193]
Watldns C J C H. 1989. Learning from Delayed Rewards. Cambridge: Cambridge University: 233-235
Xu Y C, Wang R B, Li B and Li B. 2001. A summary of worldwide intelligent vehicle. Automotive Engineering, 23(5):289-295
徐友春, 王荣本, 李兵, 李斌. 2001.世界智能车辆近况综述.汽车工程, 23(5):289-295)[DOI:10.19562/j.chinasae.qcgc.2001.05.001]
Yang L N, Liang X D, Wang T R and Xing E P. 2018. Real-to-virtual domain unification for end-to-end autonomous driving//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 553-570
Yang S. 2019. From Virtuality to Reality: Research on Deep Reinforcement Learning Based Autonomous Vehicle Control. Changchun: Jilin University
杨顺. 2019.从虚拟到现实的智能车辆深度强化学习控制研究.长春: 吉林大学
Zong X P, Xu G Y, Yu G Z, Su H J and Hu C W. 2018. Obstacle avoidance for self-driving vehicle with reinforcement learning. SAE International Journal of Passenger Cars-Electronic and Electrical Systems, 11(1):30-39[DOI:10.4271/07-11-01-0003]
相关作者
相关机构