Formula-E race strategy development using distributed policy gradient reinforcement learning

Liu Xuze; Fotouhi Abbas; Auger Daniel J.

首页> 外文期刊>Knowledge-Based Systems >Formula-E race strategy development using distributed policy gradient reinforcement learning

【24h】

Formula-E race strategy development using distributed policy gradient reinforcement learning

机译：公式 - 竞争战略开发采用分布式政策梯度加固学习

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Energy and thermal management is a crucial element in Formula-E race strategy development. In this study, the race-level strategy development is formulated into a Markov decision process (MDP) problem featuring a hybrid-type action space. Deep Deterministic Policy Gradient (DDPG) reinforcement learning is implemented under distributed architecture Ape-X and integrated with the prioritized experience replay and reward shaping techniques to optimize a hybrid-type set of actions of both continuous and discrete components. Soft boundary violation penalties in reward shaping, significantly improves the performance of DDPG and makes it capable of generating faster race finishing solutions. The new proposed method has shown superior performance in comparison to the Monte Carlo Tree Search (MCTS) with policy gradient reinforcement learning, which solves this problem in a fully discrete action space as presented in the literature. The advantages are faster race finishing time and better handling of ambient temperature rise. (C) 2021 Elsevier B.V. All rights reserved.

机译：能源和热管理是公式竞赛战略发展中的重要因素。在这项研究中，竞赛级战略发展被制定为具有混合型动作空间的马尔可夫决策过程（MDP）问题。深度确定性政策梯度（DDPG）增强学习在分布式架构APE-X下实现，并与优先考虑的经验重放和奖励塑造技术集成，以优化连续和离散组件的混合型动作集。奖励塑造中的软边界违规处罚，显着提高了DDPG的性能，并使其能够产生更快的竞争精加工解决方案。与具有政策梯度加固学习的蒙特卡罗树搜索（MCT）相比，新的提出方法表现出卓越的性能，这在文献中呈现的完全离散的动作空间中解决了这个问题。优点是比赛整齐时间更快，更好地处理环境温度升高。（c）2021 Elsevier B.v.保留所有权利。

著录项

来源
《Knowledge-Based Systems》 |2021年第15期|106781.1-106781.18|共18页
作者
Liu Xuze; Fotouhi Abbas; Auger Daniel J.;
展开▼
作者单位

Cranfield Univ Adv Vehicle Engn Ctr Sch Aerosp Transport & Mfg Cranfield MK43 0AL Beds England;

Cranfield Univ Adv Vehicle Engn Ctr Sch Aerosp Transport & Mfg Cranfield MK43 0AL Beds England;

Cranfield Univ Adv Vehicle Engn Ctr Sch Aerosp Transport & Mfg Cranfield MK43 0AL Beds England;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Energy management; Formula-E race strategy; Deep deterministic policy gradient; Reinforcement leaning;

机译：能源管理;公式 - 赛跑策略;深度确定性政策梯度;加固学习;

相似文献

外文文献
中文文献
专利

1. Formula-E race strategy development using artificial neural networks and Monte Carlo tree search [J] . Liu Xuze, Fotouhi Abbas Neural computing & applications . 2020,第18期

机译：公式 - 赛跑策略开发使用人工神经网络和蒙特卡罗树搜索
2. Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence [J] . Milo? S. Stankovi?, Marko Beko, Srdjan S. Stankovi? IFAC PapersOnLine . 2020,第2期

机译：分布式梯度时间差异偏离策略学习与资格痕迹：弱收敛
3. An Algorithm of Policy Gradient Reinforcement Learning with a Fuzzy Controller in Policies [J] . Harukazu Igarashi, Seiji Ishihara International Journal of Artificial Intelligence and Expert Systems (IJAE) . 2013,第1期

机译：策略中带有模糊控制器的策略梯度强化学习算法
4. Strategy Generation Based on Reinforcement Learning with Deep Deterministic Policy Gradient for UCAV [C] . Yunhong Ma, Shuyao Bai, Yifei Zhao, International Conference on Control, Automation, Robotics Vision . 2020

机译：基于强化学习对UCAV的深度确定性政策梯度的战略生成
5. Learning Policies for Model-Based Reinforcement Learning Using Distributed Reward Formulation [D] . Agarwal, Nikhil. 2021

机译：使用分布式奖励制定学习基于模型的强化学习的政策
6. Correction: Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail [O] . Eleni Vasilaki, Nicolas Frémaux, Robert Urbanczik, 2009

机译：更正：在连续状态和动作空间中基于峰值的强化学习：当策略梯度方法失败时

Formula-E race strategy development using distributed policy gradient reinforcement learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅