【24h】

Exploration in Metric State Spaces

机译:公制状态空间中的探索

获取原文

摘要

We present metric-E~3, a provably near-optimal algorithm for reinforcement learning in Markov decision processes in which there is a natural metric on the state space that allows the construction of accurate local models. The algorithm is a generalization of the E~3 algorithm of Kearns and Singh, and assumes a black box for approximate planning. Unlike the original E~3, metric-E~3 finds a near optimal policy in an amount of time that does not directly depend on the size of the state space, but instead depends on the covering number of the state space. Informally, the covering number is the number of neighborhoods required for accurate local modeling.
机译:我们提出metric-E〜3,这是在马尔可夫决策过程中用于强化学习的可证明的近最佳算法,其中状态空间上存在自然度量,可以构建准确的局部模型。该算法是Kearns和Singh的E〜3算法的推广,并为近似规划假设了一个黑匣子。与原始的E〜3不同,metric-E〜3在一段时间内找到了一个接近最优的策略,该时间不直接取决于状态空间的大小,而是取决于状态空间的覆盖数。非正式地,覆盖数是精确的局部建模所需的邻域数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号