首页> 外文会议>Machine learning(ML95) >TD Models: Modeling the World at a Mixture of Time Scales
【24h】

TD Models: Modeling the World at a Mixture of Time Scales

机译:TD模型:以时间尺度混合建模世界

获取原文
获取原文并翻译 | 示例

摘要

Temporal-difference (TD) learning can be used not just to predict rewards, as is commonly done in reinforcement learning, but also to predict states, i.e., to learn a model of the world's dynamics. We present the-ory and algorithms for intermixing TD models of the world at different levels of temporal abstraction within a single structure. Such multi-scale TD models can be used in model-based reinforcement-learning architectures and dynamic programming methods in place of conventional Markov models. This enables planning at higher and varied levels of abstraction, and, as such, may prove useful in formulating methods for hierarchical or multi-level planning and reinforcement learning. In this paper we treat only the prediction problem--that of learning a model and value function for the case of fixed agent behavior. Within this context, we establish the theoretical foundations of multi-scale models and derive TD algorithms for learning them. Two small computational experiments are presented to test and illustrate the theory. This work is an extension and generalization of the work of Singh (1992), Dayan (1993), and Sutton & Pinette (1985).
机译:时差(TD)学习不仅可以用于预测奖励(如强化学习中常见的那样),还可以用于预测状态,即学习世界动态模型。我们提出了在单一结构中以不同的时间抽象层次混合世界的TD模型的理论和算法。这样的多尺度TD模型可以用于基于模型的强化学习体系结构和动态编程方法中,以代替传统的Markov模型。这使得可以在更高层次和不同层次的抽象上进行计划,因此,在制定层次或多层计划和强化学习的方法中可能会证明是有用的。在本文中,我们仅处理预测问题-在固定代理行为的情况下学习模型和值函数。在此背景下,我们建立了多尺度模型的理论基础,并推导了用于学习它们的TD算法。提出了两个小计算实验来测试和说明该理论。这项工作是Singh(1992),Dayan(1993)和Sutton&Pinette(1985)的工作的延伸和概括。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号