Batch Active Preference-Based Learning of Reward Functions

Erdem Biyik; Dorsa Sadigh

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Batch Active Preference-Based Learning of Reward Functions

【24h】

Batch Active Preference-Based Learning of Reward Functions

机译：基于批量主动偏好的奖励功能学习

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data generation and labeling are usually an expensive part of learning for robotics. While active learning methods are commonly used to tackle the former problem, preference-based learning is a concept that attempts to solve the latter by querying users with preference questions. In this paper, we will develop a new algorithm, batch active preference-based learning, that enables efficient learning of reward functions using as few data samples as possible while still having short query generation times. We introduce several approximations to the batch active learning problem, and provide theoretical guarantees for the convergence of our algorithms. Finally, we present our experimental results for a variety of robotics tasks in simulation. Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time. We then showcase our algorithm in a study to learn human users’ preferences.

机译：数据生成和标记通常是机器人学习的昂贵部分。虽然主动学习方法通常用于解决前者的问题，但基于偏好的学习是一种尝试通过向用户查询偏好问题来解决后者的概念。在本文中，我们将开发一种新的算法，即基于批处理基于偏好的主动学习，该算法可使用尽可能少的数据样本来高效学习奖励函数，同时仍具有较短的查询生成时间。我们介绍了批处理主动学习问题的几种近似方法，并为我们算法的收敛提供了理论保证。最后，我们介绍了各种仿真机器人任务的实验结果。我们的结果表明，我们的批处理主动学习算法仅需要几个查询，这些查询可以在很短的时间内计算出来。然后，我们在一项研究中展示了我们的算法，以了解人类用户的偏好。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2010期|共10页
作者
Erdem Biyik; Dorsa Sadigh;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Batch Active Preference-Based Learning of Reward Functions [J] . Erdem Biyik, Dorsa Sadigh JMLR: Workshop and Conference Proceedings . 2018,第12期

机译：基于批量主动偏好的奖励功能学习
2. Active reward learning with a novel acquisition function [J] . Daniel Christian, Kroemer Oliver, Viering Malte, Autonomous robots . 2015,第3期

机译：具有新颖的获取功能的主动奖励学习
3. Active reward learning with a novel acquisition function [J] . Daniel Christian, Kroemer Oliver, Viering Malte, Autonomous robots . 2015,第3期

机译：具有新颖的获取功能的主动奖励学习
4. Gaussian Process Planning with Lipschitz Continuous Reward Functions: Towards Unifying Bayesian Optimization, Active Learning, and Beyond [C] . Chun Kai Ling, Kian Hsiang Low, Patrick Jaillet AAAI Conference on Artificial Intelligence . 2016

机译：高斯流程规划与Lipschitz持续奖励功能：走向倾向贝叶斯优化，积极学习和超越
5. Utilizing Context and Structure of Reward Functions to Improve Online Learning in Wireless Networks [D] . Sakulkar, Pranav Krishna. 2018

机译：利用奖励函数的上下文和结构来改善无线网络中的在线学习
6. Reinforcement Q-Learning Control With Reward Shaping Function for Swing Phase Control in a Semi-active Prosthetic Knee [O] . Yonatan Hutabarat, Kittipong Ekkachai, Mitsuhiro Hayashibe, 2020

机译：增强Q学习控制在半主动假肢膝关节中为摆动相位控制的奖励塑造功能
7. Active Preference-Based Learning of Reward Functions [O] . Dorsa Sadigh, Anca Dragan, Shankar Sastry, 2017

机译：基于积极的偏好的奖励功能学习

Batch Active Preference-Based Learning of Reward Functions

摘要

著录项

相似文献

相关主题

期刊订阅