【24h】

Rediscovery of Good-Turing estimators via Bayesian Nonparametrics

机译:通过贝叶斯非参数重新发现良好转换估计量

获取原文
获取原文并翻译 | 示例
           

摘要

The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library.
机译:估计发现概率的问题起源于统计生态学,近年来,由于其在遗传学,生物信息学,语言学,实验设计,机器学习等领域出现的具有挑战性的应用中屡见不鲜,因此变得流行起来。已经提出了对参数和非参数统计方法以及常客和贝叶斯统计方法的估计,以估计发现概率。在本文中,我们研究了著名的Good-Turing方法(该方法是1940年代开发的一种频繁出现的非参数方法)与最近在文献中引入的贝叶斯非参数方法之间的关系。具体而言,在先验两个参数Poisson-Dirichlet的假设下,我们表明发现概率的贝叶斯非参数估计量在大样本量下与适当平滑的Good-Turing估计量渐近等效。作为此结果的副产品,我们引入并研究了一种方法,该方法可推导与发现概率的贝叶斯非参数估计量相关的精确和渐近可信区间。通过全面的模拟研究和对通过对基准互补DNA文库进行测序而生成的表达序列标签数据的分析,说明了所提出的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号