首页> 外文会议>Annual conference on Neural Information Processing Systems >A Bayesian Approach for Policy Learning from Trajectory Preference Queries
【24h】

A Bayesian Approach for Policy Learning from Trajectory Preference Queries

机译:基于轨迹偏好查询的贝叶斯策略学习方法

获取原文

摘要

We consider the problem of learning control policies via trajectory preference queries to an expert. In particular, the agent presents an expert with short runs of a pair of policies originating from the same state and the expert indicates which trajectory is preferred. The agent's goal is to elicit a latent target policy from the expert with as few queries as possible. To tackle this problem we propose a novel Bayesian model of the querying process and introduce two methods that exploit this model to actively select expert queries. Experimental results on four benchmark problems indicate that our model can effectively learn policies from trajectory preference queries and that active query selection can be substantially more efficient than random selection.
机译:我们考虑通过对专家的轨迹偏好查询来学习控制策略的问题。尤其是,代理向专家提供了来自同一状态的一对策略的短期运行,并且专家指出了哪个轨迹是优选的。代理程序的目标是以尽可能少的查询从专家那里得出潜在的目标策略。为了解决这个问题,我们提出了一种新颖的查询过程的贝叶斯模型,并介绍了两种利用该模型主动选择专家查询的方法。关于四个基准问题的实验结果表明,我们的模型可以从轨迹偏好查询中有效学习策略,并且主动查询选择比随机选择有效得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号