首页> 外文会议>International Conference on Advanced Robotics and Mechatronics >Approximate Soft Policy Iteration Based Reinforcement Learning for Differential Games with Two Pursuers versus One Evader
【24h】

Approximate Soft Policy Iteration Based Reinforcement Learning for Differential Games with Two Pursuers versus One Evader

机译:具有两个追求者与一个逃避者的差分游戏的基于近似软策略迭代的强化学习

获取原文

摘要

Pursuit-evasion is a well studied problem in the field of optimal control, differential game and game theory. In complex multi-pursuer environment with constraints, it is difficult to find optimal solution through traditional approaches. This paper establishes an obstacle-free bounded environment for two pursuers vs. one evader and develops an approximate soft policy iteration algorithm (ASPI) using a value neural network to provide cooperative policy for the pursuers. On this basis, the training process is composed of two phases with dense-sparse reward. The first phase uses dense reward to quickly obtain general ability of pursuing, and then the second phase uses sparse reward signal to shape the value function closer to the real distribution. The simulation results and comparative analysis with the classical deep Q-learning are displayed and prove the superior performance of this approach, in terms of the win rate of capture and the time cost of training.
机译:追逃是在最优控制,微分博弈和博弈论领域中一个经过充分研究的问题。在具有约束的复杂多购买者环境中,很难通过传统方法找到最佳解决方案。本文为两个追随者与一个逃避者建立了一个无障碍的有界环境,并使用价值神经网络开发了一种近似的软策略迭代算法(ASPI),以为追随者提供协作策略。在此基础上,训练过程由具有稀疏奖励的两个阶段组成。第一阶段使用密集奖励快速获得一般的追踪能力,然后第二阶段使用稀疏奖励信号使价值函数更接近实际分布。展示了经典的深度Q学习的仿真结果和比较分析,并证明了这种方法在捕获的获胜率和训练的时间成本方面的优越性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号