Combining Importance Sampling and Temporal Difference Control Variates To Simulate Markov Chains

R. S. Randhawa; S. Juneja

首页> 外文期刊>ACM Transactions on Modeling and Computer Simulation >Combining Importance Sampling and Temporal Difference Control Variates To Simulate Markov Chains

【24h】

Combining Importance Sampling and Temporal Difference Control Variates To Simulate Markov Chains

机译：结合重要性采样和时间差异控制变量来模拟马尔可夫链

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

It is well known that in estimating performance measures associated with a stochastic system a good importance sampling distribution (IS) can give orders of magnitude of variance reduction while a bad one/may lead to large, even infinite, variance. In this paper we study how this sensitivity of the estimator variance to the importance sampling change of measure may be "dampened" by combining importarice sampling with stochastic approximation based temporal difference (TD) method. We consider ]a finite state space discrete time Markov chain (DTMC) with one-step transition rewards and an absorbing set of states and focus on estimating the cumulative expected reward to absorption starting from any state. In this setting we develop sufficient conditions under which the estimate resulting from the combined approach has a mean square error that asymptotically equals zero even when the estimate formed by using only importance sampling change of measure has infinite variance. In particular, we consider the problem of estimating the small buffer overflow probability in a queuing network, where the change of measure suggested in literature is shown to have infinite variance under certain parameters and where the appropriate combination of IS and TD method can be empirically seen to have a much faster convergence rate compared to naive simulation.

机译：众所周知，在评估与随机系统相关的性能指标时，良好的重要性抽样分布（IS）可以使方差降低几个数量级，而不良的一个可能会导致较大甚至无限的方差。在本文中，我们研究了如何通过将重要性抽样与基于随机逼近的时差（TD）方法相结合来“减弱”估算器方差对度量的重要度变化的敏感性。我们考虑一个具有一步过渡奖励和一组吸收状态的有限状态空间离散时间马尔可夫链（DTMC），并着重于估计从任何状态开始吸收的累积预期奖励。在这种情况下，我们开发了充分的条件，在这种条件下，即使仅使用度量的重要度采样变化形成的估计具有无限方差，组合方法得出的估计也具有渐近等于零的均方误差。特别是，我们考虑了估计排队网络中较小的缓冲区溢出概率的问题，其中文献中建议的量度变化显示在某些参数下具有无限方差，并且可以凭经验看到IS和TD方法的适当组合与朴素的仿真相比，具有更快的收敛速度。

著录项

来源
《ACM Transactions on Modeling and Computer Simulation》 |2004年第1期|p. 1-30|共30页
作者
R. S. Randhawa; S. Juneja;
展开▼
作者单位

Stanford University;

Tata Institute of Fundamental Research;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Importance sampling; temporal difference methods; rare events; stochastic approximation; Markov chains; variance reduction;

机译：重要性抽样;时间差异方法;稀有事件;随机逼近;马尔可夫链;方差减少;

相似文献

外文文献
中文文献
专利

1. Coupling control variates for Markov chain Monte Carlo [J] . Goodman J.B., Lin K.K. Journal of Computational Physics . 2009,第19期

机译：马尔可夫链蒙特卡罗的耦合控制变量
2. A Random-Path Markov Chain Algorithm for Simulating Categorical Soil Variables from Random Point Samples [J] . Weidong Li, Chuanrong Zhang Soil Science Society of America Journal . 2007,第3期

机译：基于随机点样本的分类土壤变量的随机路径马尔可夫链算法
3. A random-path Markov chain algorithm for simulating categorical soil variables from random point samples. [J] . Li W D, Zhang C R Soil Science Society of America Journal . 2007,第3期

机译：一种随机路径马尔可夫链算法，用于模拟随机点样本中的分类土壤变量。
4. Per-decision Multi-step Temporal Difference Learning with Control Variates [C] . Kristopher De Asis, Richard S. Sutton Conference on Uncertainty in Artificial Intelligence . 2018

机译：控制变体的每个决定多步时间差异学习
5. Optimal importance sampling for simulating rare events in Markov chains. [D] . Kuruganti, Indira. 1997

机译：最佳重要性采样，用于模拟马尔可夫链中的稀有事件。
6. Honest Importance Sampling with Multiple Markov Chains [O] . Aixin Tan, Hani Doss, James P. Hobert -1

机译：多个马尔可夫链的诚实重要性抽样
7. Coupling Control Variates for Markov Chain Monte Carlo [O] . Jonathan B. Goodman, Kevin K. Lin 2009

机译：马尔可夫链蒙特卡罗耦合控制变量
8. Simulating a Markov Chain with a Superefficient Sampling Method. [R] . Fishman, G. S. 1982

机译：用超高效抽样方法模拟马尔可夫链。

Combining Importance Sampling and Temporal Difference Control Variates To Simulate Markov Chains

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅