首页> 外文期刊>電子情報通信学会技術研究報告. ニュ-ロコンピュ-ティング. Neurocomputing >A Role of the Asymptotic Equipartition Property in Return Maximization of Reinforcement Learning
【24h】

A Role of the Asymptotic Equipartition Property in Return Maximization of Reinforcement Learning

机译:渐近均分性质在强化学习收益最大化中的作用

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Reinforcement learning is well-known as an effective framework to describe a decision-making process that consists of interactions between an agent and an environment. In the framework, an agent learns an optimal policy via return maximization, not via the instructed choices by a supervisor. The process treated in reinforcement learning is in general formulated as an ergodic Markov decision process and is designed by timing some parameters of the action-selection strategy so that the learning process eventually becomes almost stationary. In this paper, we examine a theoretical class of more general processes such that the agent can achieve return maximization by considering the asymptotic equipartition property of such processes. As a result, we show several necessary conditions that the agent and the environment have to satisfy for possible return maximization.
机译:强化学习是一种有效的框架,用于描述由代理与环境之间的相互作用组成的决策过程,这一点众所周知。在该框架中,座席不是通过主管的指示选择,而是通过收益最大化来学习最优策略。通常将强化学习中处理的过程公式化为遍历马尔可夫决策过程,并通过对动作选择策略的某些参数进行计时来设计该过程,以使学习过程最终变得几乎静止。在本文中,我们研究了更通用的过程的理论类别,以便代理可以通过考虑此类过程的渐近均分性质来实现收益最大化。结果,我们显示了代理和环境必须满足的几个必要条件,以实现可能的回报最大化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号