首页> 外文会议>IEEE Symposium Series on Computational Intelligence >Self-generation of reward based on sensor value -Improving reward accuracy by associating multiple sensors using Hebb’s rule-
【24h】

Self-generation of reward based on sensor value -Improving reward accuracy by associating multiple sensors using Hebb’s rule-

机译:基于传感器值的自成奖励 - 通过使用Hebb规则关联多个传感器来实现奖励准确性 -

获取原文

摘要

Reinforcement learning(RL) is a method in which an agent learns a desired behavior through interaction with the environment. The agent learns the action based on the reward. The reward given to the agent is designed by the person in advance. However, since it is necessary to redefine the reward every time the environment or purpose changes, it can’t adapt to various environments. Therefore, in the previous research, we have proposed a method of evaluating the input value of the sensor using a Universal Sensor Evaluation Index(USEI) that can be used in any environment, and generating self-reward based on the evaluation. However, in previous studies, in the method of evaluating only a single sensor, the input can be evaluated only by the method of directly receiving the input, and the information obtained based on other sensor information is ignored, so that it is generated by the method of the previous research. The accuracy of the reward is low. If the accuracy of the reward is low, the robot may broken because the danger cannot be recognized except by receiving dangerous input. In this study, we propose a method for generating highly accurate rewards by add relevance to individual sensors using Hebb’s rule, which is the plasticity of neuron synapses, and evaluating inputs using multiple sensors. By using the proposed method, the inputs can be evaluated in consideration of multiple sensor inputs, and the range of danger recognition can be broadened, and implementation of danger prediction can be expected.
机译:增强学习(RL)是一种方法,其中代理通过与环境的交互来学习所需的行为。代理人根据奖励来学习该行动。给予代理人的奖励由本人提前设计。但是,由于每次环境或目的发生变化都必须重新定义奖励,因此它无法适应各种环境。因此,在先前的研究中,我们提出了一种使用可在任何环境中使用的通用传感器评估指数(USEI)来评估传感器的输入值的方法,并基于评估产生自我奖励。然而,在先前的研究中,在仅评估单个传感器的方法中,可以仅通过直接接收输入的方法来评估输入,并且忽略基于其他传感器信息获得的信息,以便由此产生先前研究的方法。奖励的准确性很低。如果奖励的准确性低,则机器人可能会破坏,因为除了接收危险输入之外,除非无法识别危险。在本研究中,我们提出了一种通过使用Hebb的规则添加到各个传感器的相关性来产生高精度奖励,这是Neuron突触的可塑性,并使用多个传感器评估输入。通过使用所提出的方法,可以考虑多个传感器输入来评估输入,并且可以扩大危险识别范围,并且可以预期实现危险预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号