首页> 外文学位 >Efficient model-based exploration in continuous state-space environments.
【24h】

Efficient model-based exploration in continuous state-space environments.

机译:在连续状态空间环境中基于模型的有效探索。

获取原文
获取原文并翻译 | 示例

摘要

The impetus for exploration in reinforcement learning (RL) is decreasing uncertainty about the environment for the purpose of better decision making. As such, exploration plays a crucial role in the efficiency of RL algorithms. In this dissertation, I consider continuous state control problems and introduce a new methodology for representing uncertainty that engenders more efficient algorithms. I argue that the new notion of uncertainty allows for more efficient use of function approximation, which is essential for learning in continuous spaces. In particular, I focus on a class of algorithms referred to as model-based methods and develop several such algorithms that are much more efficient than the current state-of-the-art methods. These algorithms attack the long-standing "curse of dimensionality" -- learning complexity often scales exponentially with problem dimensionality. I introduce algorithms that can exploit the dependency structure between state variables to exponentially decrease the sample complexity of learning, both in cases where the dependency structure is provided by the user a priori and cases where the algorithm has to find it on its own. I also use the new uncertainty notion to derive a multi-resolution exploration scheme, and demonstrate how this new technique achieves anytime behavior, which is very important in real-life applications. Finally, using a set of rich experiments, I show how the new exploration mechanisms affect the efficiency of learning, especially in real-life domains where acquiring samples is expensive.
机译:强化学习(RL)探索的动力是减少环境的不确定性,以便做出更好的决策。因此,探索在RL算法的效率中起着至关重要的作用。在本文中,我考虑了连续状态控制问题,并介绍了一种新的表示不确定性的方法,该方法带来了更有效的算法。我认为不确定性的新概念允许更有效地使用函数逼近,这对于连续空间的学习至关重要。特别是,我专注于一类称为基于模型的方法的算法,并开发了几种比当前最新技术效率更高的算法。这些算法攻击了长期存在的“维数诅咒”-学习复杂度通常随问题维数成倍增长。我介绍了可以利用状态变量之间的依存关系结构以指数形式降低学习样本复杂度的算法,无论是在用户先验地提供依存关系结构的情况下还是在算法必须自行查找的情况下。我还使用新的不确定性概念来推导多分辨率探索方案,并演示该新技术如何实现随时随地的行为,这在现实应用中非常重要。最后,通过一组丰富的实验,我展示了新的探索机制如何影响学习效率,尤其是在获取样本的费用昂贵的现实领域中。

著录项

  • 作者

    Nouri, Ali.;

  • 作者单位

    Rutgers The State University of New Jersey - New Brunswick.;

  • 授予单位 Rutgers The State University of New Jersey - New Brunswick.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 183 p.
  • 总页数 183
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号