首页> 外文会议>IEEE/RSJ International Conference on Intelligent Robots and Systems >Planning on the fast lane: Learning to interact using attention mechanisms in path integral inverse reinforcement learning
【24h】

Planning on the fast lane: Learning to interact using attention mechanisms in path integral inverse reinforcement learning

机译:规划快车道:学习在路径积分反增强学习中使用注意机制进行互动

获取原文

摘要

General-purpose trajectory planning algorithms for automated driving utilize complex reward functions to perform a combined optimization of strategic, behavioral, and kinematic features. The specification and tuning of a single reward function is a tedious task and does not generalize over a large set of traffic situations. Deep learning approaches based on path integral inverse reinforcement learning have been successfully applied to predict local situation-dependent reward functions using features of a set of sampled driving policies. Sample-based trajectory planning algorithms are able to approximate a spatio-temporal subspace of feasible driving policies that can be used to encode the context of a situation. However, the interaction with dynamic objects requires an extended planning horizon, which depends on sequential context modeling. In this work, we are concerned with the sequential reward prediction over an extended time horizon. We present a neural network architecture that uses a policy attention mechanism to generate a low-dimensional context vector by concentrating on trajectories with a human-like driving style. Apart from this, we propose a temporal attention mechanism to identify context switches and allow for stable adaptation of rewards. We evaluate our results on complex simulated driving situations, including other moving vehicles. Our evaluation shows that our policy attention mechanism learns to focus on collision-free policies in the configuration space. Furthermore, the temporal attention mechanism learns persistent interaction with other vehicles over an extended planning horizon.
机译:用于自动驾驶的通用轨迹规划算法利用复杂的奖励函数来执行战略,行为和运动学特征的组合优化。单个奖励函数的规范和调整是繁琐的任务,并且不会在大量的流量情况下概括。已经成功地应用了基于路径积分逆加强学习的深度学习方法来预测使用一组采样的驾驶策略的功能来预测本地情况依赖奖励函数。基于样本的轨迹规划算法能够近似可用于编码情况的上下文的可行驾驶策略的时空子空间。但是,与动态对象的交互需要扩展规划地平线,这取决于顺序上下文建模。在这项工作中,我们关注延长时间的顺序奖励预测。我们提出了一种神经网络架构,该架构利用策略注意机制来通过集中在具有人类驾驶风格的轨迹上来生成低维上下文向量。除此之外,我们提出了一个暂时的注意机制来识别上下文交换机,并允许效率稳定适应。我们在复杂的模拟驾驶情况下评估我们的结果,包括其他移动车辆。我们的评估表明,我们的政策关注机制学会专注于配置空间中无碰撞策略。此外,时间关注机制学会在扩展规划地平线上与其他车辆的持久性互动。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号