Self-generation of reward by human interaction — Adaptation to multitask by reflecting hope degree for priority

机译：通过人际互动而自我产生奖励-通过反映希望的优先程度来适应多任务

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In recent years, a robot is required to achieve human need in human living space. Also, complicated and flexible behavior is required for multi tasks in human living space. In such a robot, studies have proceeded by using reinforcement learning. Reinforce learning is highly applicable to the real environment. When using reinforcement learning, it is necessary to design a reward function. The paperproposed self-generation of reward using general indicators for reward function. As a general indicator, we set indicators that mimic the creature's sensory organs. In the indications, a reward is generated based on pleasure and unpleasant in response to sensory input. It is thought that creatures feel unpleasant when input is too strong or too weak and feel pleasant at just right time. It is thought that creatures feel pleasant when prediction of input is easy and feel unpleasant when input prediction is difficult. In the index, pleasant and unpleasant are generated as input with the strength of input and predictability of input. The reward function gives a big reward when feeling pleasant based, and a small reward when feeling unpleasant on this index. It generates external input based on interaction with the environment, and generates reward using this index. Using this indicator for tasks eliminates the need to design reward functions for individual tasks.

机译：近年来，需要机器人来实现人类居住空间中的人类需求。而且，人类居住空间中的多项任务需要复杂而灵活的行为。在这种机器人中，已经通过使用强化学习来进行研究。加强学习非常适用于实际环境。使用强化学习时，有必要设计奖励功能。论文提出了使用奖励功能的一般指标自我生成奖励的方法。作为一般指标，我们设置了模仿生物感官器官的指标。在指示中，响应于感官输入，基于愉悦和不愉快产生奖励。人们认为，当输入太强或太弱时，生物都会感到不适，并在适当的时候感到愉悦。人们认为，当容易进行输入预测时，动物会感到愉悦，而当难以进行输入预测时，会感到不适。在该索引中，生成具有输入强度和输入可预测性的愉快和不愉快的输入。奖励功能给人以愉悦感为基础的奖励为大，而因对该指标感到不愉快则给与的奖励为小。它基于与环境的交互生成外部输入，并使用该索引生成奖励。使用此指标执行任务，无需为单个任务设计奖励功能。

著录项

来源
《International Symposium on Micro-NanoMechatronics and Human Science》|2017年|1-2|共2页
会议地点
作者
Shirakura Seiya; Takuya Masaki; Masaya Ishizuka; Kentarou Kurashige;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
human-robot interaction; learning (artificial intelligence);

机译：人机交互;学习（人工智能）;

相似文献

外文文献
中文文献
专利

1. Multitaskin: "I'm both proud of and repelled by my ability to multitask, since it reflects the best of our human ability to perform complex tasks, but also the worst of our capacity to accept an increasing amount of chaos and tension in our lives" [J] . Mark Winston Bee Culture . 1999,第5期

机译：多任务：“我为自己的多任务能力感到骄傲和排斥，因为它反映了我们人类执行复杂任务的能力的最好表现，但也反映了我们接受生活中越来越多的混乱和紧张局势的能力最差的表现”
2. Interactions Between Unsupervised Learning and the Degree of Spectral Mismatch on Short-Term Perceptual Adaptation to Spectrally Shifted Speech [J] . Tianhao Li, John J. Galvin III, Qian Jie Fu Ear and hearing. . 2009,第2期

机译：无监督学习与频谱转移语音的短期知觉适应的频谱不匹配程度之间的相互作用
3. Sustainable adaptation and human security: interactions between pastoral and agropastoral groups in dryland Kenya. (Special Issue: Sustainable adaptation to climate change: Prioritising social equity and environmental integrity.) [J] . Owuor B., Mauta W., Eriksen S. Climate and Development . 2011,第1期

机译：可持续适应与人类安全：肯尼亚干旱地区牧民和农牧民群体之间的互动。（特刊：可持续适应气候变化：优先考虑社会公平和环境完整性。）
4. Self-generation of reward by human interaction — Adaptation to multitask by reflecting hope degree for priority [C] . Shirakura Seiya, Takuya Masaki, Masaya Ishizuka, International Symposium on Micro-NanoMechatronics and Human Science . 2017

机译：通过人类互动的自我产生 - 通过反映优先希望的希望程度来改编对多址
5. Modeling Human Adaptation with Game-Theoretic Intention Decoding in Human-Robot Interactions [D] . Wang, Yiwei. 2021

机译：利用人体机器人互动中的游戏理论意图模拟人类适应
6. INTERACTIONS BETWEEN UNSUPERVISED LEARNING AND THE DEGREE OF SPECTRAL MISMATCH ON SHORT-TERM PERCEPTUAL ADAPTATION TO SPECTRALLY-SHIFTED SPEECH [O] . Tianhao Li, John J. Galvin III, Qian-Jie Fu -1

机译：无监督学习和光谱失配短期知觉适应对频谱移语音程度之间相互作用
7. Self-Generation of Reward Based on Sensory Irritation Resulted from Interaction Between a Human and Arobot [O] . Kentarou KURASHIGE, Kaoru NIKAIDO 2015

机译：基于感官刺激的自我产生奖励是由于人和阿洛伯之间的相互作用导致
8. Department of Health and Human Services, Office of the Assistant Secretary for Preparedness and Response Budget in Brief, FY 2013. An Overview of ASPR's Budget and the Strategic Priorities It Reflects. [R] . 2012

机译：卫生与人类服务部，负责准备和响应预算的助理秘书办公室，2013财政年度.aspR预算概述及其反映的战略重点。

Self-generation of reward by human interaction — Adaptation to multitask by reflecting hope degree for priority

摘要

著录项

相似文献

相关主题

期刊订阅