Self-generation of reward by human interaction — Adaptation to multitask by reflecting hope degree for priority

机译：通过人类互动的自我产生 - 通过反映优先希望的希望程度来改编对多址

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In recent years, a robot is required to achieve human need in human living space. Also, complicated and flexible behavior is required for multi tasks in human living space. In such a robot, studies have proceeded by using reinforcement learning. Reinforce learning is highly applicable to the real environment. When using reinforcement learning, it is necessary to design a reward function. The paperproposed self-generation of reward using general indicators for reward function. As a general indicator, we set indicators that mimic the creature's sensory organs. In the indications, a reward is generated based on pleasure and unpleasant in response to sensory input. It is thought that creatures feel unpleasant when input is too strong or too weak and feel pleasant at just right time. It is thought that creatures feel pleasant when prediction of input is easy and feel unpleasant when input prediction is difficult. In the index, pleasant and unpleasant are generated as input with the strength of input and predictability of input. The reward function gives a big reward when feeling pleasant based, and a small reward when feeling unpleasant on this index. It generates external input based on interaction with the environment, and generates reward using this index. Using this indicator for tasks eliminates the need to design reward functions for individual tasks.

机译：近年来，需要一个机器人来实现人类生活空间的人类需求。此外，人类生活空间中的多项任务需要复杂和灵活的行为。在这样的机器人中，研究通过使用加强学习进行。强化学习高度适用于真实环境。使用加强学习时，有必要设计奖励功能。使用奖励函数的一般指标的文书商品发行的奖励。作为一般指标，我们设定了模仿生物的感官器官的指标。在适应症中，响应感官输入，基于乐趣和令人难以令人难以释放来生成奖励。有人认为，当输入太强或太弱时，生物感到令人不快，并且在正确的时间感到愉快。有人认为，当输入预测难以时，生物在预测方面很容易并且感觉不愉快时，生物感到令人愉快。在指数中，生成令人愉快和令人难以愉快的输入，其输入输入和输入的可预测性的输入。当感到愉快的基础时，奖励功能会产生很大的奖励，并且在这种指数上感到不愉快的奖励。它基于与环境的交互生成外部输入，并使用此索引生成奖励。使用此指示符进行任务，无需为各个任务设计奖励函数。

著录项

来源
《International Symposium on Micro-NanoMechatronics and Human Science》|2017年|497p|共2页
会议地点
作者
Shirakura Seiya; Takuya Masaki; Masaya Ishizuka; Kentarou Kurashige;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TM38-53;
关键词
human-robot interaction; learning (artificial intelligence);

机译：人体机器人互动;学习（人工智能）;

相似文献

外文文献
中文文献
专利

1. Multitaskin: "I'm both proud of and repelled by my ability to multitask, since it reflects the best of our human ability to perform complex tasks, but also the worst of our capacity to accept an increasing amount of chaos and tension in our lives" [J] . Mark Winston Bee Culture . 1999,第5期

机译：多任务：“我为自己的多任务能力感到骄傲和排斥，因为它反映了我们人类执行复杂任务的能力的最好表现，但也反映了我们接受生活中越来越多的混乱和紧张局势的能力最差的表现”
2. Interactions Between Unsupervised Learning and the Degree of Spectral Mismatch on Short-Term Perceptual Adaptation to Spectrally Shifted Speech [J] . Tianhao Li, John J. Galvin III, Qian Jie Fu Ear and hearing. . 2009,第2期

机译：无监督学习与频谱转移语音的短期知觉适应的频谱不匹配程度之间的相互作用
3. Sustainable adaptation and human security: interactions between pastoral and agropastoral groups in dryland Kenya. (Special Issue: Sustainable adaptation to climate change: Prioritising social equity and environmental integrity.) [J] . Owuor B., Mauta W., Eriksen S. Climate and Development . 2011,第1期

机译：可持续适应与人类安全：肯尼亚干旱地区牧民和农牧民群体之间的互动。（特刊：可持续适应气候变化：优先考虑社会公平和环境完整性。）
4. Self-generation of reward by human interaction — Adaptation to multitask by reflecting hope degree for priority [C] . Shirakura Seiya, Takuya Masaki, Masaya Ishizuka, International Symposium on Micro-NanoMechatronics and Human Science . 2017

机译：通过人际互动而自我产生奖励-通过反映希望的优先程度来适应多任务
5. Modeling Human Adaptation with Game-Theoretic Intention Decoding in Human-Robot Interactions [D] . Wang, Yiwei. 2021

机译：利用人体机器人互动中的游戏理论意图模拟人类适应
6. INTERACTIONS BETWEEN UNSUPERVISED LEARNING AND THE DEGREE OF SPECTRAL MISMATCH ON SHORT-TERM PERCEPTUAL ADAPTATION TO SPECTRALLY-SHIFTED SPEECH [O] . Tianhao Li, John J. Galvin III, Qian-Jie Fu -1

机译：无监督学习和光谱失配短期知觉适应对频谱移语音程度之间相互作用
7. Self-Generation of Reward Based on Sensory Irritation Resulted from Interaction Between a Human and Arobot [O] . Kentarou KURASHIGE, Kaoru NIKAIDO 2015

机译：基于感官刺激的自我产生奖励是由于人和阿洛伯之间的相互作用导致
8. Department of Health and Human Services, Office of the Assistant Secretary for Preparedness and Response Budget in Brief, FY 2013. An Overview of ASPR's Budget and the Strategic Priorities It Reflects. [R] . 2012

机译：卫生与人类服务部，负责准备和响应预算的助理秘书办公室，2013财政年度.aspR预算概述及其反映的战略重点。

Self-generation of reward by human interaction — Adaptation to multitask by reflecting hope degree for priority

摘要

著录项

相似文献

相关主题

期刊订阅