首页> 外文会议>International Joint Conference on Neural Networks >Learning of binocular fixations using anomaly detection with deep reinforcement learning
【24h】

Learning of binocular fixations using anomaly detection with deep reinforcement learning

机译:使用异常检测和深度强化学习来学习双眼注视

获取原文

摘要

Due to its ability to learn complex behaviors in high-dimensional state-action spaces, deep reinforcement learning algorithms have attracted much interest in the robotics community. For a practical reinforcement learning implementation on a robot, it has to be provided with an informative reward signal that makes it easy to discriminate the values of nearby states. To address this issue, prior information, e.g. in the form of a geometric model, or human supervision are often assumed. This paper proposes a method to learn binocular fixations without such prior information. Instead, it uses an informative reward requiring little supervised information. The reward computation is based on an anomaly detection mechanism which uses convolutional autoencoders. These detectors estimate in a weakly supervised way an object's pixellic position. This position estimate is affected by noise, which makes the reward signal noisy. We first show that this affects both the learning speed and the resulting policy. Then, we propose a method to partially remove the noise using regression on the detection change given sensor data. The binocular fixation task is learned in a simulated environment on an object training set with various shapes and colors. The learned policy is compared with another one learned with a highly informative and noiseless reward signal. The tests are carried out on the training set and on a test set of new objects. We observe similar performances, showing that the environment-encoding step can replace the prior information.
机译:由于其能够学习高维状态动作空间中复杂行为的能力,深度强化学习算法已引起了机器人界的极大兴趣。为了在机器人上进行实用的强化学习,必须向其提供信息性的奖励信号,以使其易于区分附近状态的值。为了解决这个问题,需要先验信息,例如几何模型的形式,或通常需要人工监督。本文提出了一种在没有此类先验信息的情况下学习双眼注视的方法。取而代之的是,它使用需要很少监督信息的信息性奖励。奖励计算基于使用卷积自动编码器的异常检测机制。这些检测器以弱监督的方式估计对象的像素位置。该位置估计受噪声影响,这会使奖励信号产生噪声。我们首先表明,这会影响学习速度和由此产生的政策。然后,我们提出了一种使用给定传感器数据对检测变化进行回归的方法来部分去除噪声的方法。在模拟环境中,在具有各种形状和颜色的对象训练集上学习双眼固定任务。将所学习的策略与另一种具有高度信息量且无噪音的奖励信号所学习的策略进行比较。测试是在训练集和新对象的测试集上进行的。我们观察到类似的性能,表明环境编码步骤可以替代先前的信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号