...
首页> 外文期刊>International journal of systems science >Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method
【24h】

Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method

机译:通过Q学习方法对部分可观察的非零游戏进行数据驱动的自适应动态编程

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This paper concerns with a class of discrete-time linear nonzero-sum games with the partially observable system state. As is known, the optimal control policy for the nonzero-sum games relies on the full state measurement which is hard to fulfil in partially observable environment. Moreover, to achieve the optimal control, one needs to know the accurate system model. To overcome these deficiencies, this paper develops a data-driven adaptive dynamic programming method via Q-learning method using measurable input/output data without any system knowledge. First, the representation of the unmeasurable inner system state is built using historical input/output data. Then, based on the representation state, a Q-function-based policy iteration approach with convergence analysis is introduced to approximate the optimal control policy iteratively. A neural network (NN)-based actor-critic framework is applied to implement the developed data-driven approach. Finally, two simulation examples are provided to demonstrate the effectiveness of the developed approach.
机译:本文涉及一类离散时间线性非零和游戏,具有部分可观察系统状态。众所周知,非零和游戏的最佳控制策略依赖于完全状态测量,这在部分可观察环境中难以实现。而且,为了实现最佳控制,需要了解精确的系统模型。为了克服这些缺陷,本文通过Q学习方法使用可测量的输入/输出数据,在没有任何系统知识的情况下开发数据驱动的自适应动态编程方法。首先,使用历史输入/输出数据构建不可估量的内部系统状态的表示。然后,基于表示状态,引入了基于Q函数的策略迭代方法,以迭代地近似最佳控制策略。基于神经网络(NN)的演员 - 评论家框架应用于实现开发的数据驱动方法。最后,提供了两种模拟示例以证明所发育方法的有效性。

著录项

  • 来源
    《International journal of systems science》 |2019年第8期|1338-1352|共15页
  • 作者

    Wang Wei; Chen Xin; Fu Hao; Wu Min;

  • 作者单位

    China Univ Geosci Sch Automat Wuhan 430074 Hubei Peoples R China|Hubei Key Lab Adv Control & Intelligent Automat C Wuhan 430074 Hubei Peoples R China;

    China Univ Geosci Sch Automat Wuhan 430074 Hubei Peoples R China|Hubei Key Lab Adv Control & Intelligent Automat C Wuhan 430074 Hubei Peoples R China;

    China Univ Geosci Sch Automat Wuhan 430074 Hubei Peoples R China|Hubei Key Lab Adv Control & Intelligent Automat C Wuhan 430074 Hubei Peoples R China;

    China Univ Geosci Sch Automat Wuhan 430074 Hubei Peoples R China|Hubei Key Lab Adv Control & Intelligent Automat C Wuhan 430074 Hubei Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Adaptive dynamic programming; nonzero-sum games; partially observable; Q-learning;

    机译:自适应动态编程;非零和游戏;部分观察;Q-Learning;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号