Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search

机译：当地政策在凸起空间和保守政策迭代中搜索作为提升政策搜索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Local Policy Search is a popular reinforcement learning approach for handling large state spaces. Formally, it searches locally in a parameterized policy space in order to maximize the associated value function averaged over some predefined distribution. The best one can hope in general from such an approach is to get a local optimum of this criterion. The first contribution of this article is the following surprising result: if the policy space is convex, any (approximate) local optimum enjoys a global performance guarantee. Unfortunately, the convexity assumption is strong: it is not satisfied by commonly used parameterizations and designing a parameterization that induces this property seems hard. A natural solution to alleviate this issue consists in deriving an algorithm that solves the local policy search problem using a boosting approach (constrained to the convex hull of the policy space). The resulting algorithm turns out to be a slight generalization of conservative policy iteration; thus, our second contribution is to highlight an original connection between local policy search and approximate dynamic programming.

机译：本地政策搜索是一种处理大状态空间的流行钢筋学习方法。正式地，它在本地搜索参数化策略空间，以最大化在某些预定义分布上平均的关联值函数。最好的人可以从这种方法中一般希望获得本地标准的局部最佳。本文的第一个贡献是以下令人惊讶的结果：如果策略空间是凸的，则任何（近似）本地最佳均可享受全局性能保证。不幸的是，凸起假设很强：常用的参数化并设计一个诱使这个属性的参数化似乎很难满足。缓解此问题的自然解决方案包括使用升级方法解决本地策略搜索问题的算法（约束到策略空间的凸孔）。结果算法结果是保守政策迭代的轻微概括;因此，我们的第二款贡献是突出本地策略搜索和近似动态编程之间的原始连接。

著录项

来源
《European Conference on Machine Learning and Knowledge Discovery in Databases》|2014年||共16页
会议地点
作者
Bruno Scherrer; Matthieu Geist;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-532;
关键词
Local Policy Search; Boosted Policy Search; Convex Space;

机译：本地政策搜索;提升政策搜索;凸起空间;

相似文献

外文文献
中文文献
专利

1. Optimized look-ahead tree policies: a bridge between look-ahead tree policies and direct policy search [J] . Tobias Jung, Louis Wehenkel, Damien Ernst, International Journal of Adaptive Control and Signal Processing . 2014,第3a5期

机译：优化的前瞻性树策略：前瞻性树策略和直接策略搜索之间的桥梁
2. Evolutionary Policy Transfer and Search Methods for Boosting Behavior Quality: RoboCup Keep-Away Case Study [J] . Nitschke Geoff, Didi Sabre Frontiers in Robotics and AI . 2017,第23期

机译：提升行为质量的进化策略转移和搜索方法：RoboCup Keep-Away案例研究
3. Policy Search for the Optimal Control of Markov Decision Processes: A Novel Particle-Based Iterative Scheme [J] . Giorgio Manganini, Matteo Pirotta, Marcello Restelli, Cybernetics, IEEE Transactions on . 2016,第11期

机译：Markov决策过程的最优控制的策略搜索：一种新型的基于粒子的迭代方案
4. Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search [C] . Bruno Scherrer, Matthieu Geist European conference on machine learning and knowledge discovery in databases . 2014

机译：凸空间中的本地策略搜索和保守策略迭代作为增强的策略搜索
5. In search of a family policy: Family structure, children's well-being and the effects of public policy. [D] . Curran, Margaret Ann. 1998

机译：寻找家庭政策：家庭结构，儿童的福祉和公共政策的影响。
6. Assessing the Policy Landscape for Salt Reduction in South-East Asian and Latin American Countries – An Initiative Towards Developing an Easily Accessible Integrated Searchable Online Repository [O] . Aprajita Kaushik, Frank Peralta-Alvarez, Priti Gupta, 2021

机译：评估东南亚和拉丁美洲国家盐还原的政策景观 - 一项旨在开发易于访问集成可搜索的在线存储库的倡议
7. Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search [O] . Bruno Scherrer, Matthieu Geist 2015

机译：凸空间中的本地策略搜索和保守策略迭代作为提升策略搜索

Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search

摘要

著录项

相似文献

相关主题

期刊订阅