首页> 外文会议>Machine learning(ML95) >TD Models: Modeling the World at a Mixture of Time Scales

【24h】

TD Models: Modeling the World at a Mixture of Time Scales

机译：TD模型：以时间尺度混合建模世界

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Temporal-difference (TD) learning can be used not just to predict rewards, as is commonly done in reinforcement learning, but also to predict states, i.e., to learn a model of the world's dynamics. We present the-ory and algorithms for intermixing TD models of the world at different levels of temporal abstraction within a single structure. Such multi-scale TD models can be used in model-based reinforcement-learning architectures and dynamic programming methods in place of conventional Markov models. This enables planning at higher and varied levels of abstraction, and, as such, may prove useful in formulating methods for hierarchical or multi-level planning and reinforcement learning. In this paper we treat only the prediction problem--that of learning a model and value function for the case of fixed agent behavior. Within this context, we establish the theoretical foundations of multi-scale models and derive TD algorithms for learning them. Two small computational experiments are presented to test and illustrate the theory. This work is an extension and generalization of the work of Singh (1992), Dayan (1993), and Sutton & Pinette (1985).

机译：时差（TD）学习不仅可以用于预测奖励（如强化学习中常见的那样），还可以用于预测状态，即学习世界动态模型。我们提出了在单一结构中以不同的时间抽象层次混合世界的TD模型的理论和算法。这样的多尺度TD模型可以用于基于模型的强化学习体系结构和动态编程方法中，以代替传统的Markov模型。这使得可以在更高层次和不同层次的抽象上进行计划，因此，在制定层次或多层计划和强化学习的方法中可能会证明是有用的。在本文中，我们仅处理预测问题-在固定代理行为的情况下学习模型和值函数。在此背景下，我们建立了多尺度模型的理论基础，并推导了用于学习它们的TD算法。提出了两个小计算实验来测试和说明该理论。这项工作是Singh（1992），Dayan（1993）和Sutton＆Pinette（1985）的工作的延伸和概括。

著录项

来源
《Machine learning(ML95) 》|1995年|p.531-539|共9页
会议地点 Tahoe City CA(US);Tahoe City CA(US)
作者
Richard S. Sutton;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术 ;
关键词

相似文献

外文文献
中文文献
专利

1. Analysis of real-time mixture cytotoxicity data following repeated exposure using BK/TD models [J] . Teng S., Tebby C., Barcellini-Couget S., Toxicology and Applied Pharmacology . 2016 ,第Null期

机译：使用BK / TD模型对反复暴露后的实时混合物细胞毒性数据进行分析
2. Modelling financial time series based on heavy-tailed market microstructure models with scale mixtures of normal distributions [J] . Xi Yanhui, Peng Hui International journal of systems science . 2018 ,第5a8期

机译：基于带有正态分布比例混合的重尾市场微观结构模型的金融时间序列建模
3. Modelling fluorescence lifetimes with TD-DFT: a case study with syn-bimanes [J] . Wong Z. C., Fan W. Y., Chwee T. S., RSC Advances . 2016 ,第90期

机译：用TD-DFT建模荧光寿命：用Syn-Bimanes进行案例研究
4. TD Models: Modeling the World at a Mixture of Time Scales [C] . Richard S. Sutton International conference on machine learning . 1995

机译：TD型号：以时间尺度的混合建模世界
5. A Novel Multi-Scale Model of Time-Scale Integration for Modeling the Hemodynamics of the Cardiovascular System [D] . Leon, Jessica Caitlin. 2017

机译：一种模拟心血管系统血流动力学模型的新型多尺度模型
6. COUPLING TOXICOKINETIC-TOXICODYNAMIC (TK-TD) AND POPULATION MODELS FOR ASSESSING AQUATIC ECOLOGICAL RISKS TO TIME-VARYING PESTICIDE EXPOSURES [O] . Glen Thursby, Keith Sappington, Mathew Etterson -1

机译：评估毒素生物动力学（TK-TD）和人口模型以评估随时间变化的农药暴露的水生生态风险
7. TD Kernel DM+V : time-dependent statistical gas distribution modelling on simulated measurements [O] . Asadi, Sahar, Pashami, Sepideh, Loutfi, Amy, 2011

机译：TD Kernel DM + V：基于时间的统计气体分布模拟模型

TD Models: Modeling the World at a Mixture of Time Scales

摘要

著录项

相似文献

相关主题

期刊订阅