首页> 外文会议>Annual Allerton Conference on Communication, Control, and Computing >Markov processes: Estimation in the undersampled regime
【24h】

Markov processes: Estimation in the undersampled regime

机译:马尔可夫进程:估计欠采样制度

获取原文

摘要

We observe a length-n sample generated by an unknown, stationary ergodic Markov process (model) over a finite alphabet A. In this paper, we do not assume any bound on the memory of the source, nor do we assume that the source is rapidly mixing. Rather we consider a class Md of all Markov sources where for all i ∈ ℕ, the mutual information between bits i apart, conditioned on all bits in between, is bounded by log(1 + d(i)). Given any string w of symbols from A and an unknown source in Md, we want estimates of the conditional probability distribution of symbols following w (model parameters), as well as the stationary probability of w. In this setting, we can only have estimators of model parameters converge to the underlying truth in a pointwise sense over Md. However, can we look at a length-n sample and identify if an estimate is likely to be accurate? In this paper we specifically address the case where d(i) diminishes exponentially with i. Since the memory is unknown a-priori, a natural approach is to estimate a potentially coarser model with memory kn = O(log n). As n grows, estimates get refined and this approach is consistent with the above scaling of kn also known to be essentially optimal. But while effective asymptotically, the situation is quite different when we want the best answers possible with a length-n sample, rather than just consistency. Combining results in universal compression with Aldous' coupling arguments, we obtain sufficient conditions on the length-n sample (even for slow mixing models) to identify when naive (i) estimates of the model parameters and (ii) estimates related to the stationary probabilities are accurate; and also bound the deviations of the naive estimates from true values.
机译:我们观察由一个未知,静止的ergodic markov过程(模型)生成的长度-N样本在有限的字母表中。在本文中,我们不假设源的内存中的任何绑定,也不假设来源是快速混合。相反,我们考虑所有Markov源的类MD,其中所有的I∈ℕ,位于介于两者之间的所有位分开的位之间的相互信息,由log(1 + d(i))界定。给定来自MD中的A和未知源的任何字符串W,我们希望W(模型参数)之后的符号的条件概率分布以及W的静止概率估计。在这个设置中,我们只能在MD的尖锐意义上将模型参数的估计收敛到基本的真理。但是,我们可以看一下Levely-n个样本并确定估计是否可能是准确的吗?在本文中,我们专门地解决了D(i)与i指数逐渐减少的情况。由于存储器未知a-priori,因此自然方法是估计具有存储器kn = o的潜在较粗构的模型(log n)。随着N的生长,估计得到精制,这种方法与也已知基本上最佳的Kn的上述缩放一致。但是,虽然有效渐近地,当我们想要具有长度的样本的最佳答案时,情况非常不同,而不是一致性。将结果与Aldous'耦合参数的普遍压缩结果相结合,我们在长度-N样本(甚至用于慢速混合模型)上获得足够的条件,以识别模型参数的天真(i)估计和(ii)与静止概率相关的估计是准确的;并且还绑定了Naive估计的偏差来自真正值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号