首页> 外文会议>Annual conference on Neural Information Processing Systems >A Bayesian Approach for Policy Learning from Trajectory Preference Queries

【24h】

A Bayesian Approach for Policy Learning from Trajectory Preference Queries

机译：基于轨迹偏好查询的贝叶斯策略学习方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the problem of learning control policies via trajectory preference queries to an expert. In particular, the agent presents an expert with short runs of a pair of policies originating from the same state and the expert indicates which trajectory is preferred. The agent's goal is to elicit a latent target policy from the expert with as few queries as possible. To tackle this problem we propose a novel Bayesian model of the querying process and introduce two methods that exploit this model to actively select expert queries. Experimental results on four benchmark problems indicate that our model can effectively learn policies from trajectory preference queries and that active query selection can be substantially more efficient than random selection.

机译：我们考虑通过对专家的轨迹偏好查询来学习控制策略的问题。尤其是，代理向专家提供了来自同一状态的一对策略的短期运行，并且专家指出了哪个轨迹是优选的。代理程序的目标是以尽可能少的查询从专家那里得出潜在的目标策略。为了解决这个问题，我们提出了一种新颖的查询过程的贝叶斯模型，并介绍了两种利用该模型主动选择专家查询的方法。关于四个基准问题的实验结果表明，我们的模型可以从轨迹偏好查询中有效学习策略，并且主动查询选择比随机选择有效得多。

著录项

来源
《Annual conference on Neural Information Processing Systems 》|2012年|1133-1141|共9页
会议地点
作者
Aaron Wilson; Alan Fern; Prasad Tadepalli;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A preference-based approach for interactive weight learning: learning weights within a logic-based query language [J] . David Zellhoefer, Ingo Schmitt Distributed and Parallel Databases . 2010 ,第1期

机译：基于偏好的交互式权重学习方法：在基于逻辑的查询语言中学习权重
2. Evaluating monetary policy under preferences with zero wealth effect: A Bayesian approach [J] . Jaya Dey Journal of Economic Dynamics and Control . 2014 ,第jana期

机译：具有零财富效应的优惠条件下的货币政策评估：贝叶斯方法
3. Data-driven trajectory prediction with weather uncertainties: A Bayesian deep learning approach [J] . Pang Yutian, Zhao Xinyu, Yan Hao, Transportation research . 2021 ,第Sepa期

机译：天气不确定性的数据驱动轨迹预测：贝叶斯深度学习方法
4. A Bayesian Approach for Policy Learning from Trajectory Preference Queries [C] . Aaron Wilson, Alan Fern, Prasad Tadepalli Annual conference on Neural Information Processing Systems . 2012

机译：从轨迹偏好查询的贝叶斯学习方法
5. A hierarchical Bayesian finite mixture multidimensional scaling approach for accommodating structural and preference heterogeneity in three way preference data. [D] . Park, Joonwook. 2007

机译：用于适应三向偏好数据中结构和偏好异质性的分层贝叶斯有限混合多维缩放方法。
6. Query Large Scale Microarray Compendium Datasets Using a Model-Based Bayesian Approach with Variable Selection [O] . Ming Hu, Zhaohui S. Qin 2009

机译：使用具有变量选择的基于模型的贝叶斯方法查询大规模微阵列纲要数据集
7. Estimation of monetary policy preferences in a forward-looking model: a Bayesian approach. NBB Working Papers No. 129, 13 March 2008 [O] . Ilbas Pelin. 2008

机译：在前瞻性模型中估计货币政策偏好：贝叶斯方法。 NBB工作论文第129号，2008年3月13日

A Bayesian Approach for Policy Learning from Trajectory Preference Queries

摘要

著录项

相似文献

相关主题

期刊订阅