TD(λ) Networks: Temporal-Difference Networks with Eligibility Traces

机译：TD（λ）网络：具有资格迹线的时间差网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Temporal-difference (TD) networks have been introduced as a formalism for expressing and learning grounded world knowledge in a predictive form (Sutton & Tanner, 2005). Like conventional TD(0) methods, the learning algorithm for TD networks uses 1-step backups to train prediction units about future events. In conventional TD learning, the TD(λ) algorithm is often used to do more general multi-step backups of future predictions. In our work, we introduce a generalization of the 1-step TD network specification that is based on the TD(λ) learning algorithm, creating TD(λ) networks. We present experimental results that show TD(λ) networks can learn solutions in more complex environments than TD networks. We also show that in problems that can be solved by TD networks, TD(λ) networks generally learn solutions much faster than their 1-step counterparts. Finally, we present an analysis of our algorithm that shows that the computational cost of TD(λ) networks is only slightly more than that of TD networks.

机译：以预测表格表达和学习基于世界知识的形式主义（Sutton＆Tanner，2005）被引入了时间差异（TD）网络。与传统的TD（0）方法一样，TD网络的学习算法使用1步备份来训练关于未来事件的预测单元。在传统的TD学习中，TD（λ）算法通常用于执行未来预测的更一般的多步备份。在我们的工作中，我们介绍了基于TD（λ）学习算法的单步TD网络规范的概括，创建TD（λ）网络。我们提出了实验结果，显示TD（λ）网络可以在比TD网络中更复杂的环境中学习解决方案。我们还表明，在可以通过TD网络解决的问题中，TD（λ）网络通常比其1步对应物更快地学习解决方案。最后，我们展示了我们的算法的分析，表明TD（λ）网络的计算成本仅略高于TD网络的计算成本。

著录项

来源
《International Conference on Machine Learning》|2005年||共8页
会议地点
作者
Brian Tanner; Richard S. Sutton;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP181-53;
关键词

相似文献

外文文献
中文文献
专利

1. VNE-TD: A virtual network embedding algorithm based on temporal-difference learning [J] . Wang Sen, Bi Jun, Wu Jianping, Computer networks . 2019,第Octa9期

机译：VNE-TD：基于时差学习的虚拟网络嵌入算法
2. VNE-TD: A virtual network embedding algorithm based on temporal-difference learning [J] . Wang Sen, Bi Jun, Wu Jianping, Computer networks . 2019,第OCTa9期

机译：VNE-TD：基于时差学习的虚拟网络嵌入算法
3. Cancer cells population control in a delayed-model of a leukemic patient using the combination of the eligibility traces algorithm and neural networks [J] . Kalhor Elnaz, Noori Amin, Noori Ghazaleh International journal of machine learning and cybernetics . 2021,第7期

机译：使用资格迹线算法和神经网络的组合，癌细胞在白血病患者的延迟模型中控制
4. TD(λ) Networks: Temporal-Difference Networks with Eligibility Traces [C] . Brian Tanner, Richard S. Sutton International Conference on Machine Learning . 2005

机译：TD（λ）网络：具有资格迹线的时间差网络
5. Temporal-difference networks. [D] . Tanner, Brian Timothy. 2005

机译：时差网络。
6. Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network [O] . Wei-Xing Pan, Robert Schmidt, Jeffery R. Wickens, 2005

机译：多巴胺细胞对经典条件下的预测事件作出响应：奖励学习网络中的资格跟踪证据
7. GQ( ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces [O] . Hamid Reza Maei, Richard S. Sutton 2010

机译：GQ（）：资格迹线的时间差预测学习一般梯度算法

TD(λ) Networks: Temporal-Difference Networks with Eligibility Traces

摘要

著录项

相似文献

相关主题

期刊订阅