首页> 外文会议>IEEE International Conference on Machine Learning and Applications >Discretionary Lane Change Decision Making using Reinforcement Learning with Model-Based Exploration
【24h】

Discretionary Lane Change Decision Making using Reinforcement Learning with Model-Based Exploration

机译:基于模型探索的强化学习的自由行车道变更决策

获取原文

摘要

Deep reinforcement learning (DRL) techniques have been used to solve a discretionary lane change decision-making problem and are showing promising results. However, since the input information for the discretionary lane change problem is continuous and can be in high dimension, it is an open challenge for DRL to optimize the exploration-exploitation trade-off. Conventional model-less exploration methods lack a systematic way to incorporate additional engineering or model-based knowledge of our application into consideration and as a result, the training can be inefficient and may dwell on a policy, e.g. lane change strategy that is impractical. In previous related work, many used the rule-based safety check policy to guide the exploration and collect input information data. However, it is not guaranteed to get the optimal policy and the performance is dependent on the safety check policy selected. In this paper, we developed an explicit statistical aggregated environment model using a conditional variational auto-encoder and a model-based exploration strategy leveraging it. The agent is guided to explore with surprise-based intrinsic reward derived from the environment model. The result is compared with annealing epsilon-greedy exploration and with rule-based safety check exploration. We demonstrate that the performance of the developed model-based exploration method is comparable with the best rule-based safety check exploration and much better than the epsilon-greedy exploration.
机译:深度强化学习(DRL)技术已用于解决可自由选择的车道变更决策问题,并显示出令人鼓舞的结果。但是,由于用于任意车道变更问题的输入信息是连续的,并且可能是高维的,因此对DRL来说,优化勘探与开发之间的权衡是一个公开的挑战。传统的无模型探索方法缺乏系统的方法来将我们应用程序的其他工程知识或基于模型的知识纳入考虑范围,因此,培训可能效率低下,并且可能只停留在政策上,例如变道策略是不切实际的。在先前的相关工作中,许多人使用基于规则的安全检查策略来指导勘探并收集输入信息数据。但是,不能保证获得最佳策略,并且性能取决于所选的安全检查策略。在本文中,我们使用条件变分自动编码器和利用它的基于模型的探索策略,开发了一个显式的统计聚合环境模型。代理被引导探索来自环境模型的基于惊喜的内在奖励。将结果与退火ε贪婪探索和基于规则的安全检查探索进行比较。我们证明,开发的基于模型的探索方法的性能可与基于规则的最佳安全检查探索相媲美,并且比ε贪婪探索要好得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号