High throughput nonparametric probability density estimation

机译：高通量非参数概率密度估计

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.

机译：在高吞吐量的应用程序中，例如在生物信息学和金融学中发现的应用程序，重要的是确定准确的概率分布函数，尽管仅关于数据特征的信息很少，而且不使用人类主观性。通过将最大熵方法与单阶统计量和最大似然合并，实现了这种用于单变量数据的自动化过程，以实现此目标。随机变量唯一需要的属性是它们是连续的，并且它们是独立的或相同分布的，或可以近似为独立的。基于单次统计量的采样均匀随机数据的准对数似然函数用于凭经验构建样本大小不变的通用评分函数。然后，通过迭代地改进试验累积分布函数来确定概率密度估计值，其中，更好的估计值由识别非典型波动的评分函数来量化。该准则抵制拟合数据不足和过度，以作为采用贝叶斯或Akaike信息准则的替代方法。概率密度的多个估计值反映了由于随机样本中的统计波动引起的不确定性。还引入了比例分位数残差图作为有效的诊断程序，以可视化估计的概率密度的质量。基准测试表明，在特别困难的测试概率密度（包括具有不连续性，多分辨率标度，粗尾和奇异的情况）下，随着样本量的增加，概率密度函数（PDF）的估计会收敛到真实的PDF。这些结果表明该方法对于高通量统计推断具有普遍适用性。

著录项

期刊名称 PLoS Clinical Trials
作者
Jenny Farmer; Donald Jacobs;
展开▼
作者单位

展开▼
年(卷),期 2012(13),5
年度 2012
页码 e0196937
总页数 29
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. ESTIMATION OF A NONLINEAR FUNCTIONAL OF PROBABILITY DENSITY WHEN OPTIMIZING NONPARAMETRIC DECISION FUNCTIONS [J] . Lapko A. V., Lapko V. A. Measurement techniques . 2021,第1期

机译：优化非参数决策功能时概率密度的非线性功能的估计
2. Probability distribution of wind power volatility based on the moving average method and improved nonparametric kernel density estimation [J] . Peizhe Xin, Ying Liu, Nan Yang, 全球能源互联网：英文版 . 2020,第003期

机译：基于移动平均法和改进的非参数核密度估计的风电波动概率分布
3. NONPARAMETRIC ESTIMATION OF THE QUADRATIC FUNCTIONAL OF A MULTIMODAL PROBABILITY DENSITY [J] . A. V. Lapko, V. A. Lapko Measurement techniques . 2019,第9期

机译：多模式概率密度二次功能的非参数估计
4. Research on probability density modeling method of wind power fluctuation based on nonparametric kernel density estimation [C] . Daojun Chen, Hu Guo, Jian Zuo, International Conference on Intelligent Green Building and Smart Grid . 2018

机译：基于非参数核密度估计的风电波动概率密度建模方法研究
5. High Throughput Non-Parametric Probability Density Estimation via Novel Multithreaded Stitching Method [D] . Merino, Zach D. 2019

机译：基于新型多线程拼接方法的高吞吐量非参数概率密度估计
6. Probability Machines: Consistent Probability Estimation Using Nonparametric Learning Machines [O] . J. D. Malley, J. Kruppa, A. Dasgupta, -1

机译：概率机：使用非参数学习机的一致概率估计
7. High throughput nonparametric probability density estimation [O] . Jenny Farmer, Donald Jacobs 2018

机译：高吞吐量非参数概率密度估计

High throughput nonparametric probability density estimation

摘要

著录项

相似文献

相关主题

期刊订阅