Planning on the fast lane: Learning to interact using attention mechanisms in path integral inverse reinforcement learning

机译：规划快车道：学习在路径积分反增强学习中使用注意机制进行互动

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

General-purpose trajectory planning algorithms for automated driving utilize complex reward functions to perform a combined optimization of strategic, behavioral, and kinematic features. The specification and tuning of a single reward function is a tedious task and does not generalize over a large set of traffic situations. Deep learning approaches based on path integral inverse reinforcement learning have been successfully applied to predict local situation-dependent reward functions using features of a set of sampled driving policies. Sample-based trajectory planning algorithms are able to approximate a spatio-temporal subspace of feasible driving policies that can be used to encode the context of a situation. However, the interaction with dynamic objects requires an extended planning horizon, which depends on sequential context modeling. In this work, we are concerned with the sequential reward prediction over an extended time horizon. We present a neural network architecture that uses a policy attention mechanism to generate a low-dimensional context vector by concentrating on trajectories with a human-like driving style. Apart from this, we propose a temporal attention mechanism to identify context switches and allow for stable adaptation of rewards. We evaluate our results on complex simulated driving situations, including other moving vehicles. Our evaluation shows that our policy attention mechanism learns to focus on collision-free policies in the configuration space. Furthermore, the temporal attention mechanism learns persistent interaction with other vehicles over an extended planning horizon.

机译：用于自动驾驶的通用轨迹规划算法利用复杂的奖励函数来执行战略，行为和运动学特征的组合优化。单个奖励函数的规范和调整是繁琐的任务，并且不会在大量的流量情况下概括。已经成功地应用了基于路径积分逆加强学习的深度学习方法来预测使用一组采样的驾驶策略的功能来预测本地情况依赖奖励函数。基于样本的轨迹规划算法能够近似可用于编码情况的上下文的可行驾驶策略的时空子空间。但是，与动态对象的交互需要扩展规划地平线，这取决于顺序上下文建模。在这项工作中，我们关注延长时间的顺序奖励预测。我们提出了一种神经网络架构，该架构利用策略注意机制来通过集中在具有人类驾驶风格的轨迹上来生成低维上下文向量。除此之外，我们提出了一个暂时的注意机制来识别上下文交换机，并允许效率稳定适应。我们在复杂的模拟驾驶情况下评估我们的结果，包括其他移动车辆。我们的评估表明，我们的政策关注机制学会专注于配置空间中无碰撞策略。此外，时间关注机制学会在扩展规划地平线上与其他车辆的持久性互动。

著录项

来源
《IEEE/RSJ International Conference on Intelligent Robots and Systems》|2020年|5187-5193|共7页
会议地点
作者
Sascha Rosbach; Xing Li; Simon Großjohann; Silviu Homoceanu; Stefan Roth;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Trajectory planning; Reinforcement learning; Planning; Trajectory; Vehicle dynamics; Task analysis; Tuning;

机译：轨迹规划;加强学习;规划;轨迹;车辆动态;任务分析;调整;

相似文献

外文文献
中文文献
专利

1. Large-scale cost function learning for path planning using deep inverse reinforcement learning [J] . Wulfmeier Markus, Rao Dushyant, Wang Dominiczeng, The International journal of robotics research . 2017,第10期

机译：使用深度逆强化学习进行路径规划的大规模成本函数学习
2. Socially Adaptive Path Planning in Human Environments Using Inverse Reinforcement Learning [J] . Kim Beomjoon, Pineau Joelle International Journal of Social Robotics . 2016,第1期

机译：使用反增强学习的人类环境中的社会自适应路径规划
3. An application of reinforcement learning in anti-jamming mechanism of mobile robot path planning with co-safe temporal logic specifications [J] . Jian MI, Naomi KUZE, Toshimitsu USHIO 電子情報通信学会技術研究報告. 複雑コミュニケーションサイエンス . 2019,第289期

机译：加固学习在具有共安全时间逻辑规范的移动机器人路径规划中的应用
4. Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals [C] . Aghasadeghi Navid, Bretl Timothy 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems . 2011

机译：具有路径积分的连续状态空间中的最大熵逆强化学习
5. A study of collaborative distributed intelligent multi-agent reinforcement learning via multi goals for dynamic agent shortest path-planning [D] . Kim, Minsuk. 2016

机译：通过多目标进行动态代理最短路径规划的协同分布式智能多功能智能多功能多智能智能多功能
6. Deep Reinforcement Learning for Indoor Mobile Robot Path Planning [O] . Junli Gao, Weijie Ye, Jing Guo, 2020

机译：室内移动机器人路径规划的深度增强学习
7. Maximum Entropy Inverse Reinforcement Learning in Continuous State Spaces with Path Integrals [O] . Navid Aghasadeghi, Timothy W. Bretl 2013

机译：具有路径积分的连续状态空间中的最大熵逆强化学习

Planning on the fast lane: Learning to interact using attention mechanisms in path integral inverse reinforcement learning

摘要

著录项

相似文献

相关主题

期刊订阅