首页> 外文期刊>Artificial intelligence >Teachable robots: Understanding human teaching behavior to build more effective robot learners
【24h】

Teachable robots: Understanding human teaching behavior to build more effective robot learners

机译:可教学的机器人:了解人类的教学行为,以培养更有效的机器人学习者

获取原文
           

摘要

While Reinforcement Learning (RL) is not traditionally designed for interactive supervisory input from a human teacher, several works in both robot and software agents have adapted it for human input by letting a human trainer control the reward signal. In this work, we experimentally examine the assumption underlying these works, namely that the human-given reward is compatible with the traditional RL reward signal. We describe an experimental platform with a simulated RL robot and present an analysis of real-time human teaching behavior found in a study in which untrained subjects taught the robot to perform a new task. We report three main observations on how people administer feedback when teaching a Reinforcement Learning agent: (a) they use the reward channel not only for feedback, but also for future-directed guidance; (b) they have a positive bias to their feedback, possibly using the signal as a motivational channel; and (c) they change their behavior as they develop a mental model of the robotic learner. Given this, we made specific modifications to the simulated RL robot, and analyzed and evaluated its learning behavior in four follow-up experiments with human trainers. We report significant improvements on several learning measures. This work demonstrates the importance of understanding the human-teacher/robot-learner partnership in order to design algorithms that support how people want to teach and simultaneously improve the robot's learning behavior.
机译:传统上,强化学习(RL)并非为人类老师的交互式监督输入而设计,但机器人和软件代理中的一些作品都通过让人类教练控制奖励信号来使其适应人类输入。在这项工作中,我们通过实验检验了这些工作所基于的假设,即人为奖励与传统RL奖励信号兼容。我们描述了一个带有模拟RL机器人的实验平台,并提出了对一项研究中发现的实时人类教学行为的分析,在该研究中,未经训练的受试者教会了机器人执行新任务。我们报告了关于人们在教授强化学习代理人时如何管理反馈的三个主要观察结果:(a)他们不仅使用奖励渠道来获得反馈,而且还将其用于未来的指导; (b)他们对自己的反馈有积极偏见,可能利用信号作为激励渠道; (c)他们在发展机器人学习者的心理模型时改变了自己的行为。鉴于此,我们对模拟的RL机器人进行了特定的修改,并在与人类教练员进行的四个后续实验中分析和评估了其学习行为。我们报告了一些学习措施的重大改进。这项工作证明了理解人与教师/机器人与学习者之间的伙伴关系的重要性,以设计支持人们如何教书并同时改善机器人学习行为的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号