【24h】

Temporal Abstraction in Temporal-difference Networks

机译:时差网络中的时间抽象

获取原文
获取原文并翻译 | 示例

摘要

We present a generalization of temporal-difference networks to include temporally abstract options on the links of the question network. Temporal-difference (TD) networks have been proposed as a way of representing and learning a wide variety of predictions about the interaction between an agent and its environment. These predictions are compositional in that their targets are defined in terms of other predictions, and subjunctive in that that they are about what would happen if an action or sequence of actions were taken. In conventional TD networks, the inter-related predictions are at successive time steps and contingent on a single action; here we generalize them to accommodate extended time intervals and contingency on whole ways of behaving. Our generalization is based on the options framework for temporal abstraction. The primary contribution of this paper is to introduce a new algorithm for intra-option learning in TD networks with function approximation and eligibility traces. We present empirical examples of our algorithm's effectiveness and of the greater representational expressiveness of temporally-abstract TD networks.
机译:我们提出时差网络的一般化,以在问题网络的链接上包括时态抽象选项。已经提出了时差(TD)网络,作为表示和学习有关代理与其环境之间相互作用的各种预测的一种方式。这些预测是组合的,因为它们的目标是根据其他预测来定义的,而虚拟的是,它们是关于如果采取一项行动或一系列行动将会发生的事情。在传统的TD网络中,相互关联的预测是在连续的时间步长进行的,并且取决于单个动作。在这里,我们将它们概括化,以适应延长的时间间隔和整个行为方式上的偶然性。我们的概括基于用于时间抽象的选项框架。本文的主要贡献是介绍一种具有函数逼近和合格跟踪的TD网络内期权学习新算法。我们提供了算法有效性和时域抽象TD网络更大的表示性的经验示例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号