Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return

机译：比较直接和间接的时间差异方法来估算返回的方差

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Temporal-difference (TD) learning methods are widely used in reinforcement learning to estimate the expected return for each state, without a model, because of their significant advantages in computational and data efficiency. For many applications involving risk mitigation, it would also be useful to estimate the variance of the return by TD methods. In this paper, we describe a way of doing this that is substantially simpler than those proposed by Tamar, Di Castro, and Mannor in 2012, or those proposed by White and White in 2016. We show that two TD learners operating in series can learn expectation and variance estimates. The trick is to use the square of the TD error of the expectation learner as the reward of the variance learner, and the square of the expectation learner's discount rate as the discount rate of the variance learner. With these two modifications, the variance learning problem becomes a conventional TD learning problem to which standard theoretical results can be applied. Our formal results are limited to the table lookup case, for which our method is still novel, but the extension to function approximation is immediate, and we provide some empirical results for the linear function approximation case. Our experimental results show that our direct method behaves just as well as a comparable indirect method, but is generally more robust.

机译：时间差（TD）学习方法广泛用于加强学习，以估计每个状态的预期返回，而无需模型，因为它们在计算和数据效率方面的显着优势。对于涉及风险缓解的许多应用，估计TD方法返回的差异也很有用。在本文中，我们描述了一种方法，这是比2012年Tamar，Di Castro和Mannor提出的那些基本上更简单的方式，或者在2016年通过白色和白色提出的方式。我们表明两个TD学习者串行运营的人可以学习期望和方差估计。诀窍是将期望学习者的TD错误的广场作为方差学习者的奖励，以及预期学习者的折扣率作为方差学习者的折扣率。利用这两个修改，方差学习问题成为可以应用标准理论结果的传统TD学习问题。我们的正式结果仅限于表查找案例，我们的方法仍然是新颖的，但函数近似的扩展是立即的，我们为线性函数近似情况提供了一些经验结果。我们的实验结果表明，我们的直接方法的行为方式也是可比的间接方法，但通常更强大。

著录项

来源
《Conference on Uncertainty in Artificial Intelligence》|2018年|539p|共10页
会议地点
作者
Craig Sherstan; Dylan R. Ashley; Brendan Bennett; Kenny Young; Adam White; Martha White; Richard S. Sutton;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Comparing direct and indirect methods to estimate detection rates and site use of a cryptic semi-aquatic carnivore [J] . Day Casey C., Westover Matthew D., Hall Lucas K., Ecological indicators . 2016,第jula期

机译：比较直接和间接方法以估计隐性半水生食肉动物的检出率和现场使用
2. Estimating Resting Energy Expenditure by Different Methods as Compared With Indirect Calorimetry for Patients With Pulmonary Hypertension [J] . Zanella Priscila Berti, ávila Camila Coutinho, de Souza Carolina Guerini Nutrition in clinical practice: official publication of the American Society for Parenteral and Enteral Nutrition . 2018,第2期

机译：与肺动脉高压患者的间接量热法相比，通过不同方法估算休息能耗
3. Comparing direct and indirect selfing rate estimates: when are population-structure estimates reliable? [J] . Burkli A., Sieber N., Seppala K., Heredity: An International Journal of Genetics . 2017,第6期

机译：比较直接和间接自行式率估算：人口结构估计是可靠的吗？
4. Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return [C] . Craig Sherstan, Dylan R. Ashley, Brendan Bennett, Conference on Uncertainty in Artificial Intelligence . 2018

机译：比较直接和间接的时间差异方法来估算返回的方差
5. Comparison of direct and indirect methods of estimating migration and dispersal routes of the marbled salamander, Ambystoma opacum. [D] . Posbic, Karine Eliane. 2010

机译：估计大理石sal的迁移和扩散途径的直接和间接方法的比较。
6. Comparing direct and indirect selfing rate estimates: when are population-structure estimates reliable? [O] . A Bürkli, N Sieber, K Seppälä, 2017

机译：直接和间接自交率估计值比较：人口结构估计值何时可靠？
7. A Study Comparing Values of Serum Potassium Estimated by Colorimetric KIT Method with those Obtained by Direct and Indirect ION Selective Electrode Methods [O] . Radhika K, Kusuma KS, Vanitha Gowda MN, 2019

机译：用直接和间接离子选择性电极方法获得的比色剂试剂盒法估计血清钾的研究比较
8. Sport Fishing: A Comparison of Three Indirect Methods for Estimating Benefits [R] . Hueth, D. L. , Strong, E. J. , Fight, R. D. 1988

机译：体育钓鱼：三种间接评估福利方法的比较

Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅