首页> 外文学位 >Bayesian Methods for Knowledge Transfer and Policy Search in Reinforcement Learning.
【24h】

Bayesian Methods for Knowledge Transfer and Policy Search in Reinforcement Learning.

机译:强化学习中的知识转移和策略搜索的贝叶斯方法。

获取原文
获取原文并翻译 | 示例

摘要

How can an agent generalize its knowledge to new circumstances? To learn effectively an agent acting in a sequential decision problem must make intelligent action selection choices based on its available knowledge. This dissertation focuses on Bayesian methods of representing learned knowledge and develops novel algorithms that exploit the represented knowledge when selecting actions.;Our first contribution introduces the multi-task Reinforcement Learning setting in which an agent solves a sequence of tasks. An agent equipped with knowledge of the relationship between tasks can transfer knowledge between them. We propose the transfer of two distinct types of knowledge: knowledge of domain models and knowledge of policies. To represent the transferable knowledge, we propose hierarchical Bayesian priors on domain models and policies respectively. To transfer domain model knowledge, we introduce a new algorithm for model-based Bayesian Reinforcement Learning in the multi-task setting which exploits the learned hierarchical Bayesian model to improve exploration in related tasks. To transfer policy knowledge, we introduce a new policy search algorithm that accepts a policy prior as input and uses the prior to bias policy search. A specific implementation of this algorithm is developed that accepts a hierarchical policy prior. The algorithm learns the hierarchical structure and reuses components of the structure in related tasks.;Our second contribution addresses the basic problem of generalizing knowledge gained from previously-executed policies. Bayesian Optimization is a method of exploiting a prior model of an objective function to quickly identify the point maximizing the modeled objective. Successful use of Bayesian Optimization in Reinforcement Learning requires a model relating policies and their performance. Given such a model, Bayesian Optimization can be applied to search for an optimal policy. Early work using Bayesian Optimization in the Reinforcement Learning setting ignored the sequential nature of the underlying decision problem. The work presented in this thesis explicitly addresses this problem. We construct new Bayesian models that take advantage of sequence information to better generalize knowledge across policies. We empirically evaluate the value of this approach in a variety of Reinforcement Learning benchmark problems. Experiments show that our method significantly reduces the amount of exploration required to identify the optimal policy.;Our final contribution is a new framework for learning parametric policies from queries presented to an expert. In many domains it is difficult to provide expert demonstrations of desired policies. However, it may still be a simple matter for an expert to identify good and bad performance. To take advantage of this limited expert knowledge, our agent presents experts with pairs of demonstrations and asks which of the demonstrations best represents a latent target behavior. The goal is to use a small number of queries to elicit the latent behavior from the expert. We formulate a Bayesian model of the querying process, an inference procedure that estimates the posterior distribution over the latent policy space, and an active procedure for selecting new queries for presentation to the expert. We show, in multiple domains, that the algorithm successfully learns the target policy and that the active learning strategy generally improves the speed of learning.
机译:代理人如何将其知识推广到新情况?为了有效地学习在顺序决策问题中行动的主体,必须基于其可用知识做出明智的行动选择选择。本文着重介绍贝叶斯表示学习知识的方法,并开发出新颖的算法来选择动作时利用所代表的知识。我们的第一篇论文介绍了多任务强化学习设置,其中智能体解决了一系列任务。具备任务之间关系知识的代理可以在它们之间传递知识。我们建议转移两种不同类型的知识:领域模型知识和策略知识。为了表示可转让的知识,我们分别针对域模型和策略提出了层次贝叶斯先验。为了传递领域模型知识,我们在多任务设置中引入了一种新的基于模型的贝叶斯强化学习算法,该算法利用所学的分层贝叶斯模型来改进相关任务的探索。为了传递策略知识,我们引入了一种新的策略搜索算法,该算法接受先验策略作为输入,并使用先验策略进行偏向搜索。开发了该算法的特定实现,该实现预先接受了分层策略。该算法学习分层结构并在相关任务中重用该结构的组件。我们的第二个贡献是解决了概括从先前执行的策略中获得的知识的基本问题。贝叶斯优化是一种利用目标函数的先验模型快速识别最大化建模目标的点的方法。成功地在强化学习中使用贝叶斯优化需要一个与策略及其性能相关的模型。给定这样的模型,贝叶斯优化可以用于搜索最优策略。早期在强化学习设置中使用贝叶斯优化的工作忽略了基本决策问题的顺序性质。本文提出的工作明确地解决了这个问题。我们构建了新的贝叶斯模型,该模型利用序列信息来更好地概括各个策略中的知识。我们根据经验评估了这种方法在各种强化学习基准问题中的价值。实验表明,我们的方法大大减少了确定最佳策略所需的探索工作量。我们的最终贡献是从提出给专家的查询中学习参数策略的新框架。在许多领域中,很难提供所需策略的专家演示。但是,鉴定专家的好坏可能仍然是一件简单的事情。为了利用这种有限的专家知识,我们的代理人向专家提供了几对示范,并询问哪个示范最能代表潜在的目标行为。目标是使用少量查询从专家那里得出潜在的行为。我们制定了一个查询过程的贝叶斯模型,一个估计潜在策略空间上的后验分布的推理过程,以及一个选择新查询以呈现给专家的主动过程。我们显示,在多个领域中,该算法成功学习了目标策略,而主动学习策略通常可以提高学习速度。

著录项

  • 作者

    Wilson, Aaron.;

  • 作者单位

    Oregon State University.;

  • 授予单位 Oregon State University.;
  • 学科 Statistics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 171 p.
  • 总页数 171
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号