首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Batch Active Preference-Based Learning of Reward Functions
【24h】

Batch Active Preference-Based Learning of Reward Functions

机译:基于批量主动偏好的奖励功能学习

获取原文
           

摘要

Data generation and labeling are usually an expensive part of learning for robotics. While active learning methods are commonly used to tackle the former problem, preference-based learning is a concept that attempts to solve the latter by querying users with preference questions. In this paper, we will develop a new algorithm, batch active preference-based learning, that enables efficient learning of reward functions using as few data samples as possible while still having short query generation times. We introduce several approximations to the batch active learning problem, and provide theoretical guarantees for the convergence of our algorithms. Finally, we present our experimental results for a variety of robotics tasks in simulation. Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time. We then showcase our algorithm in a study to learn human users’ preferences.
机译:数据生成和标记通常是机器人学习的昂贵部分。虽然主动学习方法通​​常用于解决前者的问题,但基于偏好的学习是一种尝试通过向用户查询偏好问题来解决后者的概念。在本文中,我们将开发一种新的算法,即基于批处理基于偏好的主动学习,该算法可使用尽可能少的数据样本来高效学习奖励函数,同时仍具有较短的查询生成时间。我们介绍了批处理主动学习问题的几种近似方法,并为我们算法的收敛提供了理论保证。最后,我们介绍了各种仿真机器人任务的实验结果。我们的结果表明,我们的批处理主动学习算法仅需要几个查询,这些查询可以在很短的时间内计算出来。然后,我们在一项研究中展示了我们的算法,以了解人类用户的偏好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号