首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Learning Behaviors with Uncertain Human Feedback
【24h】

Learning Behaviors with Uncertain Human Feedback

机译:使用不确定人体反馈的学习行为

获取原文
           

摘要

Human feedback is widely used to train agents in many domains. However, previous works rarely consider the uncertainty when humans provide feedback, especially in cases that the optimal actions are not obvious to the trainers. For example, the reward of a sub-optimal action can be stochastic and sometimes exceeds that of the optimal action, which is common in games or real-world. Trainers are likely to provide positive feedback to sub-optimal actions, negative feedback to the optimal actions and even do not provide feedback in some confusing situations. Existing works, which utilize the Expectation Maximization (EM) algorithm and treat the feedback model as hidden parameters, do not consider uncertainties in the learning environment and human feedback. To address this challenge, we introduce a novel feedback model that considers the uncertainty of human feedback. However, this incurs intractable calculus in the EM algorithm. To this end, we propose a novel approximate EM algorithm, in which we approximate the expectation step with the Gradient Descent method. Experimental results in both synthetic scenarios and two real-world scenarios with human participants demonstrate the superior performance of our proposed approach.
机译:人体反馈广泛用于在许多领域中培训代理。然而,以前的作品很少考虑当人类提供反馈时的不确定性,特别是在最佳行为对培训师来说不明显。例如,子最佳动作的奖励可以是随机的,有时超过最佳动作的奖励,这在游戏或现实世界中很常见。培训师可能会为次最优行为提供积极的反馈,对最佳动作的负面反馈,甚至在一些令人困惑的情况下不提供反馈。现有的作品,利用期望最大化(EM)算法并将反馈模型视为隐藏参数,不要考虑学习环境和人体反馈中的不确定性。为了解决这一挑战,我们介绍了一种新的反馈模型,以考虑人类反馈的不确定性。然而,这将在EM算法中涉及棘突的微积分。为此,我们提出了一种新颖的近似EM算法,其中我们近似预期步骤与梯度下降方法。在综合情景和两个与人类参与者的真实情景中的实验结果展示了我们所提出的方法的卓越表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号