首页> 外文会议>International Conference on Machine Learning >TD(λ) Networks: Temporal-Difference Networks with Eligibility Traces
【24h】

TD(λ) Networks: Temporal-Difference Networks with Eligibility Traces

机译:TD(λ)网络:具有资格迹线的时间差网络

获取原文

摘要

Temporal-difference (TD) networks have been introduced as a formalism for expressing and learning grounded world knowledge in a predictive form (Sutton & Tanner, 2005). Like conventional TD(0) methods, the learning algorithm for TD networks uses 1-step backups to train prediction units about future events. In conventional TD learning, the TD(λ) algorithm is often used to do more general multi-step backups of future predictions. In our work, we introduce a generalization of the 1-step TD network specification that is based on the TD(λ) learning algorithm, creating TD(λ) networks. We present experimental results that show TD(λ) networks can learn solutions in more complex environments than TD networks. We also show that in problems that can be solved by TD networks, TD(λ) networks generally learn solutions much faster than their 1-step counterparts. Finally, we present an analysis of our algorithm that shows that the computational cost of TD(λ) networks is only slightly more than that of TD networks.
机译:以预测表格表达和学习基于世界知识的形式主义(Sutton&Tanner,2005)被引入了时间差异(TD)网络。与传统的TD(0)方法一样,TD网络的学习算法使用1步备份来训练关于未来事件的预测单元。在传统的TD学习中,TD(λ)算法通常用于执行未来预测的更一般的多步备份。在我们的工作中,我们介绍了基于TD(λ)学习算法的单步TD网络规范的概括,创建TD(λ)网络。我们提出了实验结果,显示TD(λ)网络可以在比TD网络中更复杂的环境中学习解决方案。我们还表明,在可以通过TD网络解决的问题中,TD(λ)网络通常比其1步对应物更快地学习解决方案。最后,我们展示了我们的算法的分析,表明TD(λ)网络的计算成本仅略高于TD网络的计算成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号