【24h】

Temporal-Difference Networks with History

机译:具有历史的时差网络

获取原文
获取原文并翻译 | 示例

摘要

Temporal-difference (TD) networks are a formalism for expressing and learning grounded world knowledge in a predictive form [Sutton and Tanner, 2005]. However, not all partially observable Markov decision processes can be efficiently learned with TD networks. In this paper, we extend TD networks by allowing the network-update process (answer network) to depend on the recent history of previous actions and observations rather than only on the most recent action and observation. We show that this extension enables the solution of a larger class of problems than can be solved by the original TD networks or by history-based methods alone. In addition, we apply TD networks to a problem that, while still simple, is significantly larger than has previously been considered. We show that history-extended TD networks can learn much of the common-sense knowledge of an egocentric gridworld domain with a single bit of perception.
机译:时差(TD)网络是形式化的形式,用于以预测形式表示和学习基础的世界知识[Sutton and Tanner,2005]。但是,并非所有可部分观察到的马尔可夫决策过程都可以通过TD网络有效地学习。在本文中,我们通过允许网络更新过程(应答网络)依赖于先前操作和观察的最近历史,而不仅依赖于最新操作和观察,来扩展TD网络。我们证明,与原始TD网络或仅基于历史的方法无法解决的问题相比,此扩展能够解决更多的问题。此外,我们将TD网络应用于一个问题,该问题虽然仍然很简单,但是比以前考虑的要大得多。我们证明,历史扩展的TD网络可以以一点点了解来学习以自我为中心的gridworld域的许多常识知识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号