首页> 外文学位 >Reinforcement learning in environments with independent delayed-sense dynamics.
【24h】

Reinforcement learning in environments with independent delayed-sense dynamics.

机译:在具有独立延迟感知动态的环境中进行强化学习。

获取原文
获取原文并翻译 | 示例

摘要

This thesis is a detailed investigation into applying reinforcement learning to environments with independent delayed-sense dynamics (IDSD), where some of state variables evolve independently of both agent's actions and other state variables, and can be sensed only after a delay. These independent state variables are analogous to disturbances, since they are independent of control actions and are not observable before the agent commits a course of action.;In this thesis, we first formalize IDSD problems and then develop four reinforcement learning algorithms that exploit the structure of IDSD problems to achieve better efficiency. Two of the algorithms are partially model-based and two are model-free. We discuss that for the same amount of experiments the quality of the policy learned by the proposed algorithms is better than that of learned by conventional reinforcement learning algorithms.;We demonstrate the effectiveness of our algorithms by applying them to traffic grid-world problems and to a hybrid vehicle problem, in which the traffic and driver acceleration play the role of the independent state variable respectively. We show experimentally that our algorithms evaluate a given policy more accurately than the corresponding TD(0). We also show that in the case of control, the learning speeds of our algorithms are substantially higher than the learning speed of conventional reinforcement learning algorithms that do not use the knowledge of the IDSD structure.
机译:本文是对将强化学习应用于具有独立延迟感官动力学(IDSD)的环境的详细研究,其中某些状态变量独立于主体的动作和其他状态变量而演化,并且只有在延迟之后才能被感知。这些独立的状态变量类似于扰动,因为它们独立于控制动作,并且在主体执行动作过程之前是不可观察的。;在本文中,我们首先将IDSD问题形式化,然后开发四种利用该结构的强化学习算法IDSD问题以达到更好的效率。其中两种算法是部分基于模型的,而两种是无模型的。我们讨论了在相同数量的实验中,所提算法学习的策略的质量要优于传统强化学习算法所学习的策略。;我们通过将算法应用于交通网格世界问题以及对混合动力车辆问题,其中交通和驾驶员加速分别起独立状态变量的作用。我们通过实验证明,与对应的TD(0)相比,我们的算法对给定策略的评估更为准确。我们还表明,在控制的情况下,我们算法的学习速度大大高于不使用IDSD结构知识的常规强化学习算法的学习速度。

著录项

  • 作者

    Shahamiri, Masoud.;

  • 作者单位

    University of Alberta (Canada).;

  • 授予单位 University of Alberta (Canada).;
  • 学科 Computer Science.
  • 学位 M.Sc.
  • 年度 2008
  • 页码 54 p.
  • 总页数 54
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号