A Role of the Asymptotic Equipartition Property in Return Maximization of Reinforcement Learning

Kazunori IWATA; Hideaki SAKAI; Kazushi IKEDA

首页> 外文期刊>電子情報通信学会技術研究報告. ニュ-ロコンピュ-ティング. Neurocomputing >A Role of the Asymptotic Equipartition Property in Return Maximization of Reinforcement Learning

【24h】

A Role of the Asymptotic Equipartition Property in Return Maximization of Reinforcement Learning

机译：渐近均分性质在强化学习收益最大化中的作用

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Reinforcement learning is well-known as an effective framework to describe a decision-making process that consists of interactions between an agent and an environment. In the framework, an agent learns an optimal policy via return maximization, not via the instructed choices by a supervisor. The process treated in reinforcement learning is in general formulated as an ergodic Markov decision process and is designed by timing some parameters of the action-selection strategy so that the learning process eventually becomes almost stationary. In this paper, we examine a theoretical class of more general processes such that the agent can achieve return maximization by considering the asymptotic equipartition property of such processes. As a result, we show several necessary conditions that the agent and the environment have to satisfy for possible return maximization.

机译：强化学习是一种有效的框架，用于描述由代理与环境之间的相互作用组成的决策过程，这一点众所周知。在该框架中，座席不是通过主管的指示选择，而是通过收益最大化来学习最优策略。通常将强化学习中处理的过程公式化为遍历马尔可夫决策过程，并通过对动作选择策略的某些参数进行计时来设计该过程，以使学习过程最终变得几乎静止。在本文中，我们研究了更通用的过程的理论类别，以便代理可以通过考虑此类过程的渐近均分性质来实现收益最大化。结果，我们显示了代理和环境必须满足的几个必要条件，以实现可能的回报最大化。

著录项

来源
《電子情報通信学会技術研究報告. ニュ-ロコンピュ-ティング. Neurocomputing》 |2004年第759期|共6页
作者
Kazunori IWATA; Hideaki SAKAI; Kazushi IKEDA;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词
Reinforcement Learning; Asymptotic Equipartition Property; General Decision Process;

机译：强化学习;渐近均分性质;一般决策过程;

相似文献

外文文献
中文文献
专利

1. A Role of the Asymptotic Equipartition Property in Return Maximization of Reinforcement Learning [J] . Kazunori Iwata, Hideaki Sakai, Kazushi Ikeda, 電子情報通信学会技術研究報告. ニュ-ロコンピュ-ティング. Neurocomputing . 2005,第759期

机译：渐近均分性质在强化学习收益最大化中的作用
2. A Role of the Asymptotic Equipartition Property in Return Maximization of Reinforcement Learning [J] . Kazunori IWATA, Hideaki SAKAI, Kazushi IKEDA 電子情報通信学会技術研究報告. ニュ-ロコンピュ-ティング. Neurocomputing . 2004,第759期

机译：渐近均分性质在强化学习收益最大化中的作用
3. The asymptotic equipartition property in reinforcement learning and its relation to return maximization. [J] . Iwata K, Ikeda K, Sakai H Neural Networks: The Official Journal of the International Neural Network Society . 2006,第1期

机译：强化学习中的渐近等分性质及其与收益最大化的关系。
4. Asymptotic Equipartition Property on Empirical Sequence in Reinforcement Learning [C] . Kazunori Iwata, Kazushi Ikeda, Hideaki Sakai Neural Networks and Computational Intelligence . 2004

机译：强化学习中经验序列的渐近均分性质
5. Market -based asset management and shareholder value: Investigating the roles of human capital and factor markets in maximizing returns on customer relationships. [D] . Milewicz, Chad. 2009

机译：基于市场的资产管理和股东价值：调查人力资本和要素市场在最大化客户关系回报中的作用。
6. The Generalized Asymptotic Equipartition Property: Necessary and Sufficient Conditions [O] . Matthew T. Harrison -1

机译：广义渐近均分性质：充要条件
7. Structural Return Maximization for Reinforcement Learning [O] . Joseph, Joshua, Velez, Javier, Roy, Nicholas 2014

机译：强化学习的结构回归最大化

A Role of the Asymptotic Equipartition Property in Return Maximization of Reinforcement Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅