首页> 外文OA文献 >A Maximum-Entropy Method to Estimate Discrete Distributions from Samples Ensuring Nonzero Probabilities
【2h】

A Maximum-Entropy Method to Estimate Discrete Distributions from Samples Ensuring Nonzero Probabilities

机译:最大熵方法来估计来自样本的离散分布,确保非零概率

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

When constructing discrete (binned) distributions from samples of a data set, applications exist where it is desirable to assure that all bins of the sample distribution have nonzero probability. For example, if the sample distribution is part of a predictive model for which we require returning a response for the entire codomain, or if we use Kullback–Leibler divergence to measure the (dis-)agreement of the sample distribution and the original distribution of the variable, which, in the described case, is inconveniently infinite. Several sample-based distribution estimators exist which assure nonzero bin probability, such as adding one counter to each zero-probability bin of the sample histogram, adding a small probability to the sample pdf, smoothing methods such as Kernel-density smoothing, or Bayesian approaches based on the Dirichlet and Multinomial distribution. Here, we suggest and test an approach based on the Clopper–Pearson method, which makes use of the binominal distribution. Based on the sample distribution, confidence intervals for bin-occupation probability are calculated. The mean of each confidence interval is a strictly positive estimator of the true bin-occupation probability and is convergent with increasing sample size. For small samples, it converges towards a uniform distribution, i.e., the method effectively applies a maximum entropy approach. We apply this nonzero method and four alternative sample-based distribution estimators to a range of typical distributions (uniform, Dirac, normal, multimodal, and irregular) and measure the effect with Kullback–Leibler divergence. While the performance of each method strongly depends on the distribution type it is applied to, on average, and especially for small sample sizes, the nonzero, the simple “add one counter”, and the Bayesian Dirichlet-multinomial model show very similar behavior and perform best. We conclude that, when estimating distributions without an a priori idea of their shape, applying one of these methods is favorable.
机译:当从数据集的样本构造离散(Binned)分布时,存在应用,其中期望确保样品分布的所有箱具有非零概率。例如,如果样本分布是我们需要返回整个Codomain的响应的预测模型的一部分,或者我们使用Kullback-Leibler发散来衡量样品分布的(DIS)协议和原始分配在所述情况下,该变量是不方便的无限的。存在许多基于样品的分布估计器,其确保非零箱概率,例如将一个计数器添加到样本直方图的每个零概率箱,为样品PDF添加小概率,平滑方法,如核密度平滑,或贝叶斯方法基于Dirichlet和多项分布。在这里,我们建议并测试了一种基于钢板 - Pearson方法的方法,这是利用二聚体分布。基于样品分布,计算箱占用概率的置信区间。每个置信区间的平均值是真正的箱占概率的严格正估计器,并且随着样本大小的增加是会聚。对于小型样品,它会聚朝向均匀的分布,即,该方法有效地应用最大熵方法。我们将该非零方法和基于四个替代的样本的分布估计应用于一系列典型分布(均匀,DIRAC,正常,多式联运和不规则),并测量与Kullback-Leibler发散的效果。虽然每种方法的性能强烈取决于它适用于平均的分布类型,但特别是对于小型样本尺寸,非零,简单的“添加一个计数器”,而贝叶斯Dirichlet-Multimalial模型显示出非常相似的行为和表现最好。我们得出结论,当估计分布时,无需先验到其形状,应用其中一个方法是有利的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号