首页> 外文会议>International Conference on Advanced Robotics >COACH: Learning continuous actions from COrrective Advice Communicated by Humans
【24h】

COACH: Learning continuous actions from COrrective Advice Communicated by Humans

机译:教练:从人类沟通的纠正性建议中学习持续的行动

获取原文

摘要

COACH (COrrective Advice Communicated by Humans), a new interactive learning framework that allows non-expert humans to shape a policy through corrective advice, using a binary signal in the action domain of the agent, is proposed. One of the main innovative features of COACH is a mechanism for adaptively adjusting the amount of human feedback that a given action receives, taking into consideration past feedback. The performance of COACH is compared with the one of TAMER (Teaching an Agent Manually via Evaluative Reinforcement), ACTAMER (Actor-Critic TAMER), and an autonomous agent trained using SARSA(?) in two reinforcement learning problems. COACH outperforms all other learning frameworks in the reported experiments. In addition, results show that COACH is able to transfer successfully human knowledge to agents with continuous actions, being a complementary approach to TAMER, which is appropriate for teaching in discrete action domains.
机译:提出了一种新的交互式学习框架COACH(人为交流的纠正性建议),该框架允许非专家使用代理的作用域中的二进制信号通过纠正性建议来制定策略。 COACH的主要创新功能之一是一种机制,可以在考虑到过去的反馈的情况下,自适应地调整给定动作收到的人类反馈的数量。在两个强化学习问题中,将COACH的性能与TAMER(通过评估强化手动教学代理),ACTAMER(演员-关键TAMER)和使用SARSA(?)训练的自主代理中的一个进行了比较。在所报告的实验中,COACH的性能优于所有其他学习框架。此外,结果表明,COACH能够将人类知识成功地传递给具有连续动作的主体,这是TAMER的一种补充方法,适用于离散动作领域中的教学。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号