首页> 外文会议>IEEE Symposium Series on Computational Intelligence >Self-generation of reward based on sensor value -Improving reward accuracy by associating multiple sensors using Hebb’s rule-

【24h】

Self-generation of reward based on sensor value -Improving reward accuracy by associating multiple sensors using Hebb’s rule-

机译：基于传感器值的自成奖励 - 通过使用Hebb规则关联多个传感器来实现奖励准确性 -

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement learning(RL) is a method in which an agent learns a desired behavior through interaction with the environment. The agent learns the action based on the reward. The reward given to the agent is designed by the person in advance. However, since it is necessary to redefine the reward every time the environment or purpose changes, it can’t adapt to various environments. Therefore, in the previous research, we have proposed a method of evaluating the input value of the sensor using a Universal Sensor Evaluation Index(USEI) that can be used in any environment, and generating self-reward based on the evaluation. However, in previous studies, in the method of evaluating only a single sensor, the input can be evaluated only by the method of directly receiving the input, and the information obtained based on other sensor information is ignored, so that it is generated by the method of the previous research. The accuracy of the reward is low. If the accuracy of the reward is low, the robot may broken because the danger cannot be recognized except by receiving dangerous input. In this study, we propose a method for generating highly accurate rewards by add relevance to individual sensors using Hebb’s rule, which is the plasticity of neuron synapses, and evaluating inputs using multiple sensors. By using the proposed method, the inputs can be evaluated in consideration of multiple sensor inputs, and the range of danger recognition can be broadened, and implementation of danger prediction can be expected.

机译：增强学习（RL）是一种方法，其中代理通过与环境的交互来学习所需的行为。代理人根据奖励来学习该行动。给予代理人的奖励由本人提前设计。但是，由于每次环境或目的发生变化都必须重新定义奖励，因此它无法适应各种环境。因此，在先前的研究中，我们提出了一种使用可在任何环境中使用的通用传感器评估指数（USEI）来评估传感器的输入值的方法，并基于评估产生自我奖励。然而，在先前的研究中，在仅评估单个传感器的方法中，可以仅通过直接接收输入的方法来评估输入，并且忽略基于其他传感器信息获得的信息，以便由此产生先前研究的方法。奖励的准确性很低。如果奖励的准确性低，则机器人可能会破坏，因为除了接收危险输入之外，除非无法识别危险。在本研究中，我们提出了一种通过使用Hebb的规则添加到各个传感器的相关性来产生高精度奖励，这是Neuron突触的可塑性，并使用多个传感器评估输入。通过使用所提出的方法，可以考虑多个传感器输入来评估输入，并且可以扩大危险识别范围，并且可以预期实现危险预测。

著录项

来源
《IEEE Symposium Series on Computational Intelligence 》|2020年|1886-1892|共7页
会议地点
作者
Sosuke Kondo; Kentarou Kurashige;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Robot sensing systems; Neurons; Robots; Task analysis; Synapses; Indexes; Robot learning;

机译：机器人传感系统;神经元;机器人;任务分析;突触;索引;机器人学习;

相似文献

外文文献
中文文献
专利

1. Cross-layer channel selection and reward-based power allocation for increasing system capacity and reward in multiple-input-multiple-output wireless communications [J] . Chang, B.-J., Liang, Communications, IET . 2013 ,第10期

机译：跨层信道选择和基于奖励的功率分配，可在多输入多输出无线通信中提高系统容量和奖励
2. Self-Generation of Reward by Moderate-Based Index for Senor Inputs [J] . Kentarou Kurashige, Kaoru Nikaido Journal of robotics and mechatronics . 2015 ,第1a155期

机译：通过基于中度索引的传感器输入自我生成奖励
3. Q-learning Reward Propagation Method for Reducing the Transmission Power of Sensor Nodes in Wireless Sensor Networks [J] . Yunsick Sung, Eunyoung Ahn, Kyungeun Cho Wireless personal communications: An Internaional Journal . 2013 ,第2期

机译：降低无线传感器网络中传感器节点传输功率的Q学习奖励传播方法
4. Proposal of Time-based evaluation for Universal Sensor Evaluation Index in Self-generation of Reward [C] . Afiqe Anuar bin Muhammad Nor Hakim, Koudai Fukuzawa, Kentarou Kurashige IEEE International Conference on Systems, Man, and Cybernetics . 2020

机译：普遍传感器评估指标在自成奖励中的基于时间评估的提案
5. Development of a Genetically-encoded Oxytocin Sensor to Define the Role of Oxytocin in Predicting Social Reward [D] . Mignocchi, Neymi Layne. 2020

机译：遗传编码催产素传感器的发展，以确定催产素在预测社会奖励中的作用
6. Multi-Sensor Fusion with Interacting Multiple Model Filter for Improved Aircraft Position Accuracy [O] . Taehwan Cho, Changho Lee, Sangbang Choi 2013

机译：具有交互多模型滤波器的多传感器融合以提高飞机的位置精度
7. Self-Generation of Reward Based on Sensory Irritation Resulted from Interaction Between a Human and Arobot [O] . Kentarou KURASHIGE, Kaoru NIKAIDO 2015

机译：基于感官刺激的自我产生奖励是由于人和阿洛伯之间的相互作用导致

Self-generation of reward based on sensor value -Improving reward accuracy by associating multiple sensors using Hebb’s rule-

摘要

著录项

相似文献

相关主题

期刊订阅