首页> 外文会议>International Conference on Advanced Robotics and Mechatronics >Approximate Soft Policy Iteration Based Reinforcement Learning for Differential Games with Two Pursuers versus One Evader

【24h】

Approximate Soft Policy Iteration Based Reinforcement Learning for Differential Games with Two Pursuers versus One Evader

机译：具有两个追求者与一个逃避者的差分游戏的基于近似软策略迭代的强化学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Pursuit-evasion is a well studied problem in the field of optimal control, differential game and game theory. In complex multi-pursuer environment with constraints, it is difficult to find optimal solution through traditional approaches. This paper establishes an obstacle-free bounded environment for two pursuers vs. one evader and develops an approximate soft policy iteration algorithm (ASPI) using a value neural network to provide cooperative policy for the pursuers. On this basis, the training process is composed of two phases with dense-sparse reward. The first phase uses dense reward to quickly obtain general ability of pursuing, and then the second phase uses sparse reward signal to shape the value function closer to the real distribution. The simulation results and comparative analysis with the classical deep Q-learning are displayed and prove the superior performance of this approach, in terms of the win rate of capture and the time cost of training.

机译：追逃是在最优控制，微分博弈和博弈论领域中一个经过充分研究的问题。在具有约束的复杂多购买者环境中，很难通过传统方法找到最佳解决方案。本文为两个追随者与一个逃避者建立了一个无障碍的有界环境，并使用价值神经网络开发了一种近似的软策略迭代算法（ASPI），以为追随者提供协作策略。在此基础上，训练过程由具有稀疏奖励的两个阶段组成。第一阶段使用密集奖励快速获得一般的追踪能力，然后第二阶段使用稀疏奖励信号使价值函数更接近实际分布。展示了经典的深度Q学习的仿真结果和比较分析，并证明了这种方法在捕获的获胜率和训练的时间成本方面的优越性能。

著录项

来源
《International Conference on Advanced Robotics and Mechatronics 》|2020年|471-476|共6页
会议地点
作者
Fan Jiang; Xian Guo; Xuebo Zhang; Zhichao Zhang; Dazhi Dong;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Games; Approximation algorithms; Mathematical model; Robots; Training; Mechatronics; Learning (artificial intelligence);

机译：游戏;逼近算法;数学模型;机器人;培训;机电一体化;学习（人工智能）;

相似文献

外文文献
中文文献
专利

1. Robust Policies for a Multiple-Pursuer Single-Evader Differential Game [J] . Von Moll Alexander, Pachter Meir, Garcia Eloy, Dynamic games and applications . 2020 ,第1期

机译：用于多追猎单避难剂差异游戏的强大政策
2. A Novel Multi Pursuers-one Evader Game based on Quantum Game Theory [J] . Wang Hao, Dong Chao-Wei, Fang Bao-Fu Information Technology Journal . 2013 ,第12期

机译：基于量子博弈论的新型“多追求者”规避游戏
3. A Novel Multi Pursuers-one Evader Game based on Quantum Game Theory [J] . Wang Hao, Dong Chao-Wei, Fang Bao-Fu Information Technology Journal . 2013 ,第12期

机译：基于量子博弈论的新型多追求者-逃避者博弈
4. Vision-Based Reinforcement Learning using Approximate Policy Iteration [C] . Marwan R. Shaker, Shigang Yue, Tom Duckett 2009 14th international conference on advanced robotics (ICAR 2009), pages 484-961 . 2009

机译：使用近似策略迭代的基于视觉的强化学习
5. Efficient approximate policy iteration methods for sequential decision making in reinforcement learning. [D] . Lagoudakis, Michail G. 2003

机译：强化学习中顺序决策的有效近似策略迭代方法。
6. Multi-agent reinforcement learning with approximate model learning for competitive games [O] . Young Joon Park, Yoon Sang Cho, Seoung Bum Kim 2012

机译：多主体强化学习和近似模型学习的竞技游戏
7. Multiple Pursuer Multiple Evader Differential Games [O] . Eloy Garcia, David W. Casbeer, Alexander Von Moll, 2021

机译：多个追求多避难者差动游戏

Approximate Soft Policy Iteration Based Reinforcement Learning for Differential Games with Two Pursuers versus One Evader

摘要

著录项

相似文献

相关主题

期刊订阅