首页> 外文会议>European Workshop on Reinforcement Learning >A Framework for Computing Bounds for the Return of a Policy
【24h】

A Framework for Computing Bounds for the Return of a Policy

机译:用于恢复策略的界限的框架

获取原文

摘要

We present a framework for computing bounds for the return of a policy in finite-horizon, continuous-state Markov Decision Processes with bounded state transitions. The state transition bounds can be based on either prior knowledge alone, or on a combination of prior knowledge and data. Our framework uses a piecewise-constant representation of the return bounds and a backwards iteration process. We instantiate this framework for a previously investigated type of prior knowledge - namely, Lipschitz continuity of the transition function. In this context, we show that the existing bounds of Fonteneau et al. (2009, 2010) can be expressed as a particular instantiation of our framework, by bounding the immediate rewards using Lipschitz continuity and choosing a particular form for the regions in the piecewise-constant representation. We also show how different instantiations of our framework can improve upon their bounds.
机译:我们为有限范围内的策略返回到具有有界状态转换的连续状态马尔可夫决策过程中的策略返回的框架。状态转换界限可以仅基于先前的知识,或者在先验知识和数据的组合上。我们的框架使用返回界和向后迭代过程的分段常数表示。我们将本框架实例化以先前调查的先前知识类型 - 即过渡功能的Lipschitz连续性。在这种情况下,我们展示了Fonteneau等人的现有范围。 (2009年,2010年)可以通过使用Lipschitz连续性的直接奖励来表示为我们框架的特定实例化,并在分段常数表示中为区域中的区域选择特定形式。我们还展示了我们框架的不同实例如何改善其界限。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号