Temporal-Difference Networks with History

机译：具有历史的时差网络

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Temporal-difference (TD) networks are a formalism for expressing and learning grounded world knowledge in a predictive form [Sutton and Tanner, 2005]. However, not all partially observable Markov decision processes can be efficiently learned with TD networks. In this paper, we extend TD networks by allowing the network-update process (answer network) to depend on the recent history of previous actions and observations rather than only on the most recent action and observation. We show that this extension enables the solution of a larger class of problems than can be solved by the original TD networks or by history-based methods alone. In addition, we apply TD networks to a problem that, while still simple, is significantly larger than has previously been considered. We show that history-extended TD networks can learn much of the common-sense knowledge of an egocentric gridworld domain with a single bit of perception.

机译：时差（TD）网络是形式化的形式，用于以预测形式表示和学习基础的世界知识[Sutton and Tanner，2005]。但是，并非所有可部分观察到的马尔可夫决策过程都可以通过TD网络有效地学习。在本文中，我们通过允许网络更新过程（应答网络）依赖于先前操作和观察的最近历史，而不仅依赖于最新操作和观察，来扩展TD网络。我们证明，与原始TD网络或仅基于历史的方法无法解决的问题相比，此扩展能够解决更多的问题。此外，我们将TD网络应用于一个问题，该问题虽然仍然很简单，但是比以前考虑的要大得多。我们证明，历史扩展的TD网络可以以一点点了解来学习以自我为中心的gridworld域的许多常识知识。

著录项

来源
《International Joint Conference on Artifical Intelligence(IJCAI-05); 20050730-0805; Edinburgh(GB) 》|2005年|P.865-870|共6页
会议地点 Edinburgh(GB)
作者
Brian Tanner; Richard S. Sutton;
展开▼
作者单位

University of Alberta Reinforcement Learning and Artificial Intelligence Laboratory Edmonton, Alberta, Canada T6G 2E8;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论 ;
关键词

相似文献

外文文献
中文文献
专利

1. A 1-mW CMOS Temporal-Difference AER Sensor for Wireless Sensor Networks [J] . Kim D., Fu Z., Park J. H., Electron Devices, IEEE Transactions on . 2009 ,第11期

机译：用于无线传感器网络的1mW CMOS时差AER传感器
2. VNE-TD: A virtual network embedding algorithm based on temporal-difference learning [J] . Wang Sen, Bi Jun, Wu Jianping, Computer networks . 2019 ,第Octa9期

机译：VNE-TD：基于时差学习的虚拟网络嵌入算法
3. VNE-TD: A virtual network embedding algorithm based on temporal-difference learning [J] . Wang Sen, Bi Jun, Wu Jianping, Computer networks . 2019 ,第OCTa9期

机译：VNE-TD：基于时差学习的虚拟网络嵌入算法
4. Temporal-Difference Networks with History [C] . Brian Tanner, Richard S. Sutton International Joint Conference on Artificial Intelligence . 2007

机译：与历史的时间差网络
5. Temporal-difference networks. [D] . Tanner, Brian Timothy. 2005

机译：时差网络。
6. Striatal and Tegmental Neurons Code Critical Signals for Temporal-Difference Learning of State Value in Domestic Chicks [O] . Chentao Wen, Yukiko Ogura, Toshiya Matsushima 2016

机译：纹状体和背盖神经元代码关键信号的家禽的状态值的时差学习。
7. Using decision trees as the answer network in temporal-difference networks [O] . Antanas Laura, Driessens Kurt, Croonenborghs Tom, 2008

机译：在时差网络中将决策树用作答案网络

Temporal-Difference Networks with History

摘要

著录项

相似文献

相关主题

期刊订阅