首页> 外文期刊>Knowledge-Based Systems >Formula-E race strategy development using distributed policy gradient reinforcement learning
【24h】

Formula-E race strategy development using distributed policy gradient reinforcement learning

机译:公式 - 竞争战略开发采用分布式政策梯度加固学习

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Energy and thermal management is a crucial element in Formula-E race strategy development. In this study, the race-level strategy development is formulated into a Markov decision process (MDP) problem featuring a hybrid-type action space. Deep Deterministic Policy Gradient (DDPG) reinforcement learning is implemented under distributed architecture Ape-X and integrated with the prioritized experience replay and reward shaping techniques to optimize a hybrid-type set of actions of both continuous and discrete components. Soft boundary violation penalties in reward shaping, significantly improves the performance of DDPG and makes it capable of generating faster race finishing solutions. The new proposed method has shown superior performance in comparison to the Monte Carlo Tree Search (MCTS) with policy gradient reinforcement learning, which solves this problem in a fully discrete action space as presented in the literature. The advantages are faster race finishing time and better handling of ambient temperature rise. (C) 2021 Elsevier B.V. All rights reserved.
机译:能源和热管理是公式竞赛战略发展中的重要因素。在这项研究中,竞赛级战略发展被制定为具有混合型动作空间的马尔可夫决策过程(MDP)问题。深度确定性政策梯度(DDPG)增强学习在分布式架构APE-X下实现,并与优先考虑的经验重放和奖励塑造技术集成,以优化连续和离散组件的混合型动作集。奖励塑造中的软边界违规处罚,显着提高了DDPG的性能,并使其能够产生更快的竞争精加工解决方案。与具有政策梯度加固学习的蒙特卡罗树搜索(MCT)相比,新的提出方法表现出卓越的性能,这在文献中呈现的完全离散的动作空间中解决了这个问题。优点是比赛整齐时间更快,更好地处理环境温度升高。 (c)2021 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号