首页> 外文会议>International Symposium on Micro-NanoMechatronics and Human Science >Self-generation of reward by human interaction — Adaptation to multitask by reflecting hope degree for priority
【24h】

Self-generation of reward by human interaction — Adaptation to multitask by reflecting hope degree for priority

机译:通过人类互动的自我产生 - 通过反映优先希望的希望程度来改编对多址

获取原文

摘要

In recent years, a robot is required to achieve human need in human living space. Also, complicated and flexible behavior is required for multi tasks in human living space. In such a robot, studies have proceeded by using reinforcement learning. Reinforce learning is highly applicable to the real environment. When using reinforcement learning, it is necessary to design a reward function. The paperproposed self-generation of reward using general indicators for reward function. As a general indicator, we set indicators that mimic the creature's sensory organs. In the indications, a reward is generated based on pleasure and unpleasant in response to sensory input. It is thought that creatures feel unpleasant when input is too strong or too weak and feel pleasant at just right time. It is thought that creatures feel pleasant when prediction of input is easy and feel unpleasant when input prediction is difficult. In the index, pleasant and unpleasant are generated as input with the strength of input and predictability of input. The reward function gives a big reward when feeling pleasant based, and a small reward when feeling unpleasant on this index. It generates external input based on interaction with the environment, and generates reward using this index. Using this indicator for tasks eliminates the need to design reward functions for individual tasks.
机译:近年来,需要一个机器人来实现人类生活空间的人类需求。此外,人类生活空间中的多项任务需要复杂和灵活的行为。在这样的机器人中,研究通过使用加强学习进行。强化学习高度适用于真实环境。使用加强学习时,有必要设计奖励功能。使用奖励函数的一般指标的文书商品发行的奖励。作为一般指标,我们设定了模仿生物的感官器官的指标。在适应症中,响应感官输入,基于乐趣和令人难以令人难以释放来生成奖励。有人认为,当输入太强或太弱时,生物感到令人不快,并且在正确的时间感到愉快。有人认为,当输入预测难以时,生物在预测方面很容易并且感觉不愉快时,生物感到令人愉快。在指数中,生成令人愉快和令人难以愉快的输入,其输入输入和输入的可预测性的输入。当感到愉快的基础时,奖励功能会产生很大的奖励,并且在这种指数上感到不愉快的奖励。它基于与环境的交互生成外部输入,并使用此索引生成奖励。使用此指示符进行任务,无需为各个任务设计奖励函数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号