A Framework for Computing Bounds for the Return of a Policy

机译：用于恢复策略的界限的框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a framework for computing bounds for the return of a policy in finite-horizon, continuous-state Markov Decision Processes with bounded state transitions. The state transition bounds can be based on either prior knowledge alone, or on a combination of prior knowledge and data. Our framework uses a piecewise-constant representation of the return bounds and a backwards iteration process. We instantiate this framework for a previously investigated type of prior knowledge - namely, Lipschitz continuity of the transition function. In this context, we show that the existing bounds of Fonteneau et al. (2009, 2010) can be expressed as a particular instantiation of our framework, by bounding the immediate rewards using Lipschitz continuity and choosing a particular form for the regions in the piecewise-constant representation. We also show how different instantiations of our framework can improve upon their bounds.

机译：我们为有限范围内的策略返回到具有有界状态转换的连续状态马尔可夫决策过程中的策略返回的框架。状态转换界限可以仅基于先前的知识，或者在先验知识和数据的组合上。我们的框架使用返回界和向后迭代过程的分段常数表示。我们将本框架实例化以先前调查的先前知识类型 - 即过渡功能的Lipschitz连续性。在这种情况下，我们展示了Fonteneau等人的现有范围。（2009年，2010年）可以通过使用Lipschitz连续性的直接奖励来表示为我们框架的特定实例化，并在分段常数表示中为区域中的区域选择特定形式。我们还展示了我们框架的不同实例如何改善其界限。

著录项

来源
《European Workshop on Reinforcement Learning》|2012年||共12页
会议地点
作者
Cosmin Paduraru; Doina Precup; Joelle Pineau;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
入库时间 2022-08-20 22:07:27

相似文献

外文文献
中文文献
专利

1. Portfolio Selection in Mean-Minimum Return Level-Expected Bounded First Passage Time Framework [J] . Tsotne Kutalia Journal of Mathematical Finance . 2019,第3期

机译：平均收益最小-预期有界首次通过时间框架中的投资组合选择
2. Channel coordination with manufacturer’s return policies within a newsvendor framework [J] . F. J. Arcelus, Satyendra Kumar, G. Srinivasan 4OR: A Quarterly Journal of Operations Research . 2011,第3期

机译：在报贩框架内与制造商的退货政策进行渠道协调
3. Channel coordination with manufacturer's return policies within a newsvendor framework [J] . F. J. Arcelus, Satyendra Kumar, G. Srinivasan 4OR: Quarterly Journal of the Belgian, French and Italian Operations Research Societies . 2011,第3期

机译：在新闻供应商框架内与制造商的退货政策进行渠道协调
4. A Framework for Computing Bounds for the Return of a Policy [C] . Cosmin Paduraru, Doina Precup, Joelle Pineau European Workshop on Reinforcement Learning . 2012

机译：用于恢复策略的界限的框架
5. The Impact of Tariffs on Asset Returns in the New U.S. Trade Policy Framework [D] . Tao, Xuan. 2019

机译：新美国贸易政策框架中关税对资产收益的影响
6. Policy Issues in Medical Computing. a Framework for Medical Informatics: A Framework for Medical Information Science [O] . Bruce Blum 1983

机译：医疗计算中的政策问题。医学信息学框架：医学信息科学框架
7. A Framework for Computing Bounds for the Return of a Policy [O] . Cosmin Păduraru, Doina Precup, Joelle Pineau 2014

机译：计算退货界限的框架

A Framework for Computing Bounds for the Return of a Policy

摘要

著录项

相似文献

相关主题

期刊订阅