Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation

Hajime Fujita; Shin Ishii

首页> 外文期刊>Neural computation >Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation

【24h】

Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation

机译：具有基于采样状态估计的部分可观察游戏的基于模型的强化学习

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Games constitute a challenging domain of reinforcement learning (RL) for acquiring strategies because many of them include multiple players and many unobservable variables in a large state space. The difficulty of solving such realistic multiagent problems with partial observability arises mainly from the fact that the computational cost for the estimation and prediction in the whole state space, including unobservable variables, is too heavy. To overcome this intractability and enable an agent to learn in an unknown environment, an effective approximation method is required with explicit learning of the environmental model. We present a model-based RL scheme for large-scale multiagent problems with partial observability and apply it to a card game, hearts. This game is a well-defined example of an imperfect information game and can be approximately formulated as a partially observable Markov decision process (POMDP) for a single learning agent. To reduce the computational cost, we use a sampling technique in which the heavy integration required for the estimation and prediction can be approximated by a plausible number of samples. Computer simulation results show that our method is effective in solving such a difficult, partially observable multiagent problem.

机译：游戏构成了用于获取策略的强化学习（RL）的具有挑战性的领域，因为它们中的许多包含多个玩家，并且在较大的状态空间中包含许多不可观察的变量。解决这种具有部分可观察性的现实多主体问题的困难主要是由于这样一个事实，即包括不可观察变量在内的整个状态空间中用于估计和预测的计算成本过高。为了克服这种难处理性并使代理能够在未知环境中学习，需要有效的近似方法并明确学习环境模型。我们针对具有部分可观察性的大规模多主体问题提出了一种基于模型的RL方案，并将其应用于纸牌游戏中。该游戏是不完善信息游戏的一个明确定义的示例，可以近似地表述为单个学习代理的部分可观察的马尔可夫决策过程（POMDP）。为了减少计算成本，我们使用了一种采样技术，其中估计和预测所需的重积分可以通过合理数量的样本进行近似。计算机仿真结果表明，我们的方法可有效解决这种困难的，可部分观察的多主体问题。

著录项

来源
《Neural computation》 |2007年第11期|p.3051-3087|共37页
作者
Hajime Fujita; Shin Ishii;
展开▼
作者单位

Nara Institute of Science and Technology, Graduate School of Information Science, Ikoma, Nara 630-0192, Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《化学文摘》(CA);
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. PALO bounds for reinforcement learning in partially observable stochastic games [J] . Ceren Roi, He Keyang, Doshi Prashant, Neurocomputing . 2021,第Jana8期

机译：Palo界限为部分可观察到的随机游戏中的加固学习
2. A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game [J] . SHIN ISHH, HAJIME FUJITA, MASAOKI MITSUTAKE, Machine Learning . 2005,第1a2期

机译：部分可观察的多智能体游戏的强化学习方案
3. Partially observable environment estimation with uplift inference for reinforcement learning based recommendation [J] . Shang Wenjie, Li Qingyang, Qin Zhiwei, Machine Learning . 2021,第9期

机译：基于学习的强化推论的部分可观察环境估算
4. A Multi-Agent Reinforcement Learning Method for a Partially-Observable Competitive Game [C] . Yoichiro Matsuno, Tatsuya Ymazaki, Shin Ishii 5th International Conference on Autonomous Agents, 5th, May 28 - Jun 1, 2001, Montreal, Canada . 2001

机译：部分可观察竞争游戏的多智能体强化学习方法
5. Understanding Model-Based Reinforcement Learning and its Application in Safe Reinforcement Learning [D] . Hu, Dingcheng . 2019

机译：了解基于模型的强化学习及其在安全强化学习中的应用
6. Multi-agent reinforcement learning with approximate model learning for competitive games [O] . Young Joon Park, Yoon Sang Cho, Seoung Bum Kim 2012

机译：多主体强化学习和近似模型学习的竞技游戏
7. Model-based Reinforcement Learning for Partially Observable Games with Sampling-based State Estimation [O] . Hajime Fujita, Shin Ishii 2008

机译：基于采样的状态估计对部分可观察游戏的基于模型的强化学习

Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅