首页> 外文会议>European Conference on Machine Learning and Knowledge Discovery in Databases >Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search
【24h】

Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search

机译:当地政策在凸起空间和保守政策迭代中搜索作为提升政策搜索

获取原文

摘要

Local Policy Search is a popular reinforcement learning approach for handling large state spaces. Formally, it searches locally in a parameterized policy space in order to maximize the associated value function averaged over some predefined distribution. The best one can hope in general from such an approach is to get a local optimum of this criterion. The first contribution of this article is the following surprising result: if the policy space is convex, any (approximate) local optimum enjoys a global performance guarantee. Unfortunately, the convexity assumption is strong: it is not satisfied by commonly used parameterizations and designing a parameterization that induces this property seems hard. A natural solution to alleviate this issue consists in deriving an algorithm that solves the local policy search problem using a boosting approach (constrained to the convex hull of the policy space). The resulting algorithm turns out to be a slight generalization of conservative policy iteration; thus, our second contribution is to highlight an original connection between local policy search and approximate dynamic programming.
机译:本地政策搜索是一种处理大状态空间的流行钢筋学习方法。正式地,它在本地搜索参数化策略空间,以最大化在某些预定义分布上平均的关联值函数。最好的人可以从这种方法中一般希望获得本地标准的局部最佳。本文的第一个贡献是以下令人惊讶的结果:如果策略空间是凸的,则任何(近似)本地最佳均可享受全局性能保证。不幸的是,凸起假设很强:常用的参数化并设计一个诱使这个属性的参数化似乎很难满足。缓解此问题的自然解决方案包括使用升级方法解决本地策略搜索问题的算法(约束到策略空间的凸孔)。结果算法结果是保守政策迭代的轻微概括;因此,我们的第二款贡献是突出本地策略搜索和近似动态编程之间的原始连接。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号