...
首页> 外文期刊>Neural computation >Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation
【24h】

Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation

机译:具有基于采样状态估计的部分可观察游戏的基于模型的强化学习

获取原文
获取原文并翻译 | 示例
           

摘要

Games constitute a challenging domain of reinforcement learning (RL) for acquiring strategies because many of them include multiple players and many unobservable variables in a large state space. The difficulty of solving such realistic multiagent problems with partial observability arises mainly from the fact that the computational cost for the estimation and prediction in the whole state space, including unobservable variables, is too heavy. To overcome this intractability and enable an agent to learn in an unknown environment, an effective approximation method is required with explicit learning of the environmental model. We present a model-based RL scheme for large-scale multiagent problems with partial observability and apply it to a card game, hearts. This game is a well-defined example of an imperfect information game and can be approximately formulated as a partially observable Markov decision process (POMDP) for a single learning agent. To reduce the computational cost, we use a sampling technique in which the heavy integration required for the estimation and prediction can be approximated by a plausible number of samples. Computer simulation results show that our method is effective in solving such a difficult, partially observable multiagent problem.
机译:游戏构成了用于获取策略的强化学习(RL)的具有挑战性的领域,因为它们中的许多包含多个玩家,并且在较大的状态空间中包含许多不可观察的变量。解决这种具有部分可观察性的现实多主体问题的困难主要是由于这样一个事实,即包括不可观察变量在内的整个状态空间中用于估计和预测的计算成本过高。为了克服这种难处理性并使代理能够在未知环境中学习,需要有效的近似方法并明确学习环境模型。我们针对具有部分可观察性的大规模多主体问题提出了一种基于模型的RL方案,并将其应用于纸牌游戏中。该游戏是不完善信息游戏的一个明确定义的示例,可以近似地表述为单个学习代理的部分可观察的马尔可夫决策过程(POMDP)。为了减少计算成本,我们使用了一种采样技术,其中估计和预测所需的重积分可以通过合理数量的样本进行近似。计算机仿真结果表明,我们的方法可有效解决这种困难的,可部分观察的多主体问题。

著录项

  • 来源
    《Neural computation》 |2007年第11期|p.3051-3087|共37页
  • 作者

    Hajime Fujita; Shin Ishii;

  • 作者单位

    Nara Institute of Science and Technology, Graduate School of Information Science, Ikoma, Nara 630-0192, Japan;

  • 收录信息 美国《科学引文索引》(SCI);美国《化学文摘》(CA);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 人工智能理论;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号