首页> 外文会议>Annual conference on Neural Information Processing Systems >Hierarchical Optimistic Region Selection driven by Curiosity
【24h】

Hierarchical Optimistic Region Selection driven by Curiosity

机译:好奇心驱动的分层乐观区域选择

获取原文

摘要

This paper aims to take a step forwards making the term "intrinsic motivation" from reinforcement learning theoretically well founded, focusing on curiosity-driven learning. To that end, we consider the setting where, a fixed partition V of a continuous space X being given, and a process v defined on X being unknown, we are asked to sequentially decide which cell of the partition to select as well as where to sample v in that cell, in order to minimize a loss function that is inspired from previous work on curiosity-driven learning. The loss on each cell consists of one term measuring a simple worst case quadratic sampling error, and a penalty term proportional to the range of the variance in that cell. The corresponding problem formulation extends the setting known as active learning for multi-armed bandits to the case when each arm is a continuous region, and we show how an adaptation of recent algorithms for that problem and of hierarchical optimistic sampling algorithms for optimization can be used in order to solve this problem. The resulting procedure, called Hierarchical Optimistic Region SElection driven by Curiosity (HORSE.C) is provided together with a finite-time regret analysis.
机译:本文旨在向前迈出一步,从强化学习中使“内在动机”一词在理论上建立良好的基础,重点是好奇心驱动的学习。为此,我们考虑以下设置:给定连续空间X的固定分区V,并且未知X上定义的过程v,要求我们顺序决定要选择分区的哪个单元以及在哪里选择为了使损失函数最小化,该损失函数是由先前有关好奇心驱动的学习的启发而产生的。每个像元上的损失包括一个测量简单最坏情况二次采样误差的项,和一个与那个像元方差范围成正比的惩罚项。相应的问题表述将称为多臂匪徒主动学习的设置扩展到每个臂都是连续区域的情况,并且我们展示了如何使用针对该问题的最新算法和用于优化的分层乐观采样算法进行改编为了解决这个问题。所产生的过程称为“好奇心驱动的分层乐观区域选择(HORSE.C)”,同时提供了有限时间后悔分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号