首页> 外文期刊>Neural Networks: The Official Journal of the International Neural Network Society >An information-theoretic analysis of return maximization in reinforcement learning.
【24h】

An information-theoretic analysis of return maximization in reinforcement learning.

机译:强化学习中收益最大化的信息理论分析。

获取原文
获取原文并翻译 | 示例
           

摘要

We present a general analysis of return maximization in reinforcement learning. This analysis does not require assumptions of Markovianity, stationarity, and ergodicity for the stochastic sequential decision processes of reinforcement learning. Instead, our analysis assumes the asymptotic equipartition property fundamental to information theory, providing a substantially different view from that in the literature. As our main results, we show that return maximization is achieved by the overlap of typical and best sequence sets, and we present a class of stochastic sequential decision processes with the necessary condition for return maximization. We also describe several examples of best sequences in terms of return maximization in the class of stochastic sequential decision processes, which satisfy the necessary condition.
机译:我们对强化学习中的回报最大化进行一般分析。对于强化学习的随机顺序决策过程,此分析不需要马尔可夫性,平稳性和遍历性的假设。取而代之的是,我们的分析假设信息论基础的渐近均分性质,与文献提供的观点截然不同。作为我们的主要结果,我们证明了典型和最佳序列集的重叠实现了收益最大化,并且我们提出了一类具有收益最大化条件的随机顺序决策过程。我们还根据满足必要条件的随机顺序决策过程中的收益最大化描述了最佳序列的几个示例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号