Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method

Wang Wei; Chen Xin; Fu Hao; Wu Min

首页> 外文期刊>International journal of systems science >Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method

【24h】

Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method

机译：通过Q学习方法对部分可观察的非零游戏进行数据驱动的自适应动态编程

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper concerns with a class of discrete-time linear nonzero-sum games with the partially observable system state. As is known, the optimal control policy for the nonzero-sum games relies on the full state measurement which is hard to fulfil in partially observable environment. Moreover, to achieve the optimal control, one needs to know the accurate system model. To overcome these deficiencies, this paper develops a data-driven adaptive dynamic programming method via Q-learning method using measurable input/output data without any system knowledge. First, the representation of the unmeasurable inner system state is built using historical input/output data. Then, based on the representation state, a Q-function-based policy iteration approach with convergence analysis is introduced to approximate the optimal control policy iteratively. A neural network (NN)-based actor-critic framework is applied to implement the developed data-driven approach. Finally, two simulation examples are provided to demonstrate the effectiveness of the developed approach.

机译：本文涉及一类离散时间线性非零和游戏，具有部分可观察系统状态。众所周知，非零和游戏的最佳控制策略依赖于完全状态测量，这在部分可观察环境中难以实现。而且，为了实现最佳控制，需要了解精确的系统模型。为了克服这些缺陷，本文通过Q学习方法使用可测量的输入/输出数据，在没有任何系统知识的情况下开发数据驱动的自适应动态编程方法。首先，使用历史输入/输出数据构建不可估量的内部系统状态的表示。然后，基于表示状态，引入了基于Q函数的策略迭代方法，以迭代地近似最佳控制策略。基于神经网络（NN）的演员 - 评论家框架应用于实现开发的数据驱动方法。最后，提供了两种模拟示例以证明所发育方法的有效性。

著录项

来源
《International journal of systems science》 |2019年第8期|1338-1352|共15页
作者
Wang Wei; Chen Xin; Fu Hao; Wu Min;
展开▼
作者单位

China Univ Geosci Sch Automat Wuhan 430074 Hubei Peoples R China|Hubei Key Lab Adv Control & Intelligent Automat C Wuhan 430074 Hubei Peoples R China;

China Univ Geosci Sch Automat Wuhan 430074 Hubei Peoples R China|Hubei Key Lab Adv Control & Intelligent Automat C Wuhan 430074 Hubei Peoples R China;

China Univ Geosci Sch Automat Wuhan 430074 Hubei Peoples R China|Hubei Key Lab Adv Control & Intelligent Automat C Wuhan 430074 Hubei Peoples R China;

China Univ Geosci Sch Automat Wuhan 430074 Hubei Peoples R China|Hubei Key Lab Adv Control & Intelligent Automat C Wuhan 430074 Hubei Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Adaptive dynamic programming; nonzero-sum games; partially observable; Q-learning;

机译：自适应动态编程;非零和游戏;部分观察;Q-Learning;

相似文献

外文文献
中文文献
专利

1. Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method [J] . Wang Wei, Chen Xin, Fu Hao, International journal of systems science . 2019,第5a8期

机译：通过Q学习方法对部分可观察的非零和游戏进行数据驱动的自适应动态规划
2. Off-policy based adaptive dynamic programming method for nonzero-sum games on discrete-time system [J] . Wen Yinlei, Zhang Huaguang, Ren He, Journal of the Franklin Institute . 2020,第12期

机译：离散时间系统非零和游戏的基于非策略的自适应动态规划方法
3. Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs [J] . Zhang Qichao, Zhao Dongbin, Zhu Yuanheng Neurocomputing . 2017,第MAY17期

机译：具有部分约束输入的连续时间完全合作博弈的数据驱动自适应动态规划
4. Data-driven adaptive dynamic programming for two-player nonzero-sum game [C] . Qichao Zhang, Dongbin Zhao, Yafei Zhou Chinese Control and Decision Conference . 2017

机译：两人非零和游戏的数据驱动自适应动态规划
5. Automating inhabitant interactions in home and workplace environments through data-driven generation of hierarchical partially-observable Markov decision processes. [D] . Youngblood, Gregory Michael. 2005

机译：通过数据驱动的层次可部分观察的马尔可夫决策过程的生成，自动化家庭和工作场所环境中的居民交互。
6. Optimizing adaptive cancer therapy: dynamic programming and evolutionary game theory [O] . Mark Gluzman, Jacob G. Scott, Alexander Vladimirsky 2020

机译：优化适应性癌症治疗：动态规划和进化博弈论
7. Dynamic Programming for One-Sided Partially Observable Pursuit-Evasion Games [O] . Horák, Karel, Bošanský, Branislav 2016

机译：单面部分可观察追踪 - 逃避的动态规划游戏
8. New Algorithms for Collaborative and Adversarial Decision Making in Partially Observable Stochastic Games [R] . Zilberstein, S. 2009

机译：部分可观测随机游戏协同与对抗决策的新算法

Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method

摘要

著录项

相似文献

相关主题

期刊订阅