首页> 外文会议>International Symposium on Micro-NanoMechatronics and Human Science >Self-generation of reward by human interaction — Adaptation to multitask by reflecting hope degree for priority
【24h】

Self-generation of reward by human interaction — Adaptation to multitask by reflecting hope degree for priority

机译:通过人际互动而自我产生奖励-通过反映希望的优先程度来适应多任务

获取原文

摘要

In recent years, a robot is required to achieve human need in human living space. Also, complicated and flexible behavior is required for multi tasks in human living space. In such a robot, studies have proceeded by using reinforcement learning. Reinforce learning is highly applicable to the real environment. When using reinforcement learning, it is necessary to design a reward function. The paperproposed self-generation of reward using general indicators for reward function. As a general indicator, we set indicators that mimic the creature's sensory organs. In the indications, a reward is generated based on pleasure and unpleasant in response to sensory input. It is thought that creatures feel unpleasant when input is too strong or too weak and feel pleasant at just right time. It is thought that creatures feel pleasant when prediction of input is easy and feel unpleasant when input prediction is difficult. In the index, pleasant and unpleasant are generated as input with the strength of input and predictability of input. The reward function gives a big reward when feeling pleasant based, and a small reward when feeling unpleasant on this index. It generates external input based on interaction with the environment, and generates reward using this index. Using this indicator for tasks eliminates the need to design reward functions for individual tasks.
机译:近年来,需要机器人来实现人类居住空间中的人类需求。而且,人类居住空间中的多项任务需要复杂而灵活的行为。在这种机器人中,已经通过使用强化学习来进行研究。加强学习非常适用于实际环境。使用强化学习时,有必要设计奖励功能。论文提出了使用奖励功能的一般指标自我生成奖励的方法。作为一般指标,我们设置了模仿生物感官器官的指标。在指示中,响应于感官输入,基于愉悦和不愉快产生奖励。人们认为,当输入太强或太弱时,生物都会感到不适,并在适当的时候感到愉悦。人们认为,当容易进行输入预测时,动物会感到愉悦,而当难以进行输入预测时,会感到不适。在该索引中,生成具有输入强度和输入可预测性的愉快和不愉快的输入。奖励功能给人以愉悦感为基础的奖励为大,而因对该指标感到不愉快则给与的奖励为小。它基于与环境的交互生成外部输入,并使用该索引生成奖励。使用此指标执行任务,无需为单个任务设计奖励功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号