首页> 外文会议>International conference on neural information processing;ICONIP 2010 >An Information-Spectrum Approach to Analysis of Return Maximization in Reinforcement Learning
【24h】

An Information-Spectrum Approach to Analysis of Return Maximization in Reinforcement Learning

机译:强化学习中收益最大化分析的信息频谱方法

获取原文

摘要

In reinforcement learning, Markov decision processes are the most popular stochastic sequential decision processes. We frequently assume stationar-ity or ergodicity, or both to the process for its analysis, but most stochastic sequential decision processes arising in reinforcement learning are in fact, not necessarily Markovian, stationary, or ergodic. In this paper, we give an information-spectrum analysis of return maximization in more general processes than stationary or ergodic Markov decision processes. We also present a class of stochastic sequential decision processes with the necessary condition for return maximization. We provide several examples of best sequences in terms of return maximization in the class.
机译:在强化学习中,马尔可夫决策过程是最流行的随机顺序决策过程。我们经常假设平稳性或遍历性,或两者兼而有之,以进行分析,但实际上,强化学习中出现的大多数随机顺序决策过程不一定是马尔可夫式,平稳性或遍历性的。在本文中,我们给出了比平稳或遍历马尔可夫决策过程更一般的过程中收益最大化的信息频谱分析。我们还提出了一类随机的顺序决策过程,具有最大化回报的必要条件。我们提供了有关类中返回最大化的最佳序列的几个示例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号