Learning Behaviors with Uncertain Human Feedback

Xu He; Haipeng Chen; Bo An

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Learning Behaviors with Uncertain Human Feedback

【24h】

Learning Behaviors with Uncertain Human Feedback

机译：使用不确定人体反馈的学习行为

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Human feedback is widely used to train agents in many domains. However, previous works rarely consider the uncertainty when humans provide feedback, especially in cases that the optimal actions are not obvious to the trainers. For example, the reward of a sub-optimal action can be stochastic and sometimes exceeds that of the optimal action, which is common in games or real-world. Trainers are likely to provide positive feedback to sub-optimal actions, negative feedback to the optimal actions and even do not provide feedback in some confusing situations. Existing works, which utilize the Expectation Maximization (EM) algorithm and treat the feedback model as hidden parameters, do not consider uncertainties in the learning environment and human feedback. To address this challenge, we introduce a novel feedback model that considers the uncertainty of human feedback. However, this incurs intractable calculus in the EM algorithm. To this end, we propose a novel approximate EM algorithm, in which we approximate the expectation step with the Gradient Descent method. Experimental results in both synthetic scenarios and two real-world scenarios with human participants demonstrate the superior performance of our proposed approach.

机译：人体反馈广泛用于在许多领域中培训代理。然而，以前的作品很少考虑当人类提供反馈时的不确定性，特别是在最佳行为对培训师来说不明显。例如，子最佳动作的奖励可以是随机的，有时超过最佳动作的奖励，这在游戏或现实世界中很常见。培训师可能会为次最优行为提供积极的反馈，对最佳动作的负面反馈，甚至在一些令人困惑的情况下不提供反馈。现有的作品，利用期望最大化（EM）算法并将反馈模型视为隐藏参数，不要考虑学习环境和人体反馈中的不确定性。为了解决这一挑战，我们介绍了一种新的反馈模型，以考虑人类反馈的不确定性。然而，这将在EM算法中涉及棘突的微积分。为此，我们提出了一种新颖的近似EM算法，其中我们近似预期步骤与梯度下降方法。在综合情景和两个与人类参与者的真实情景中的实验结果展示了我们所提出的方法的卓越表现。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2020年第2010期|共10页
作者
Xu He; Haipeng Chen; Bo An;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning [J] . Loftin Robert, Peng Bei, MacGlashan James, Autonomous agents and multi-agent systems . 2016,第1期

机译：通过人类提供的离散反馈来学习行为：对隐式反馈策略进行建模以加快学习速度
2. The effect of the nonhuman external regulator's answer-until-correct (AUC) versus knowledge-of-result (KR) task feedback on children's behavioral regulation during learning tasks [J] . Adel M. Agina, Piet A.M. Kommers, Michael M. Steehouder Computers in Human Behavior . 2011,第5期

机译：非人类外部监管者的“直到答案正确”（AUC）与“结果知识”（KR）任务反馈对学习任务中儿童行为调节的影响
3. Deterministic learning from neural control for uncertain nonlinear pure-feedback systems by output feedback [J] . Zhang Fukai, Wang Cong International Journal of Robust and Nonlinear Control . 2020,第7期

机译：通过输出反馈来确定非线性非线性纯反馈系统神经控制的确定性学习
4. Learning Behaviors from a Single Video Demonstration Using Human Feedback: Extended Abstract [C] . Sunil Gandhi, Tim Oates, Tinoosh Mohsenin, International Conference on Autonomous Agents and Multiagent Systems . 2019

机译：使用人体反馈的单一视频演示学习行为：扩展摘要
5. The Role of Feedback Contingency in Perceptual Category Learning: An Investigation of Neurobiological and Behavioral Mechanisms [D] . Vucovich, Lauren Elizabeth. 2016

机译：反馈应急在感知类别学习中的作用：神经生物学和行为机制的调查
6. Learning to learn about uncertain feedback [O] . Maïlys C.M. Faraut, Emmanuel Procyk, Charles R.E. Wilson 2016

机译：学习了解不确定的反馈
7. Evolution of reinforcement learning in uncertain environments: A simple explanation for complex foraging behaviors [O] . Yael Niv, Daphna Joel, Isaac Meilijson 2002

机译：不确定环境中强化学习的演变：对复杂觅食行为的简单解释

Learning Behaviors with Uncertain Human Feedback

摘要

著录项

相似文献

相关主题

期刊订阅